# Pauli Propagation with CUDA-Q

이 노트북은 PennyLane 버전의 Pauli Propagation을 CUDA-Q로 구현합니다.

## 목표
- GPU 가속을 활용한 양자 회로 시뮬레이션
- Heisenberg picture에서의 연산자 진화 구현
- PennyLane 버전과 성능 비교

In [1]:
import cudaq
import numpy as np
from typing import List, Tuple, Dict
import time

# 사용 가능한 타겟 확인
print("Available CUDA-Q targets:")
for target in cudaq.get_targets():
    print(f"  - {target}")

# GPU 가속을 위한 타겟 설정 (nvidia-mgpu가 있다면 사용)
# cudaq.set_target("nvidia")

Available CUDA-Q targets:
  - Target tensornet-mps
	simulator=tensornet_mps
	platform=default
	description=cutensornet simulator backend target based on matrix product state representation
	precision=fp64
Supported Arguments:
  - option (Specify the target options as a comma-separated list.
Supported options are 'fp32', 'fp64')

  - Target stim
	simulator=stim
	platform=default
	description=Stim-based CPU-only backend target
	precision=fp64

  - Target quera
	simulator=qpp
	platform=default
	description=CUDA-Q target for QuEra.
	precision=fp32
Supported Arguments:
  - machine (Specify the QuEra QPU.)
  - default_bucket (Specify a default S3 bucket for QuEra results.)

  - Target quantum_machines
	simulator=qpp
	platform=default
	description=CUDA-Q target for Quantum Machines.
	precision=fp32
Supported Arguments:
  - url (Specify Quantum Machine base server url.)
  - executor (Specify the executor to run. Default is mock)
  - api_key (An API key to access the Qoperator server)

  - Targ

## Hardware-Efficient Ansatz 구현

CUDA-Q의 kernel 데코레이터를 사용하여 양자 회로를 정의합니다.

In [52]:
@cudaq.kernel
def hardware_efficient_ansatz(qubits: cudaq.qview, 
                               params: List[float],
                               num_layers: int):
    """
    Hardware-efficient ansatz with RZ rotations and CNOT entanglement
    
    Args:
        qubits: quantum register
        params: flattened parameter array
        num_layers: number of circuit layers
    """
    num_qubits = qubits.size()
    param_idx = 0
    
    for layer in range(num_layers):
        # Rotation layer
        for i in range(num_qubits):
            rz(params[param_idx], qubits[i])
            param_idx += 1
            rz(params[param_idx], qubits[i])
            param_idx += 1
            rz(params[param_idx], qubits[i])
            param_idx += 1
        
        # Entanglement layer (ring topology)
        if layer < num_layers - 1:
            for i in range(num_qubits):
                x.ctrl(qubits[i], qubits[(i + 1) % num_qubits])


# 테스트: 작은 회로 생성
num_qubits = 4
num_layers = 2
num_params = num_qubits * 3 * num_layers

np.random.seed(42)
test_params = np.random.random(num_params).tolist()

print(f"\n회로 파라미터 수: {num_params}")
print(f"큐비트 수: {num_qubits}")
print(f"레이어 수: {num_layers}")


회로 파라미터 수: 24
큐비트 수: 4
레이어 수: 2


## 기댓값 계산 (Standard Approach)

먼저 표준적인 방법으로 Hamiltonian의 기댓값을 계산합니다.

In [53]:
def create_heisenberg_hamiltonian(num_qubits: int, seed: int = 123) -> cudaq.SpinOperator:
    """
    Create Heisenberg XYZ Hamiltonian with random coefficients
    H = sum_j (h_j^X X_j X_{j+1} + h_j^Y Y_j Y_{j+1} + h_j^Z Z_j Z_{j+1})
    """
    np.random.seed(seed)
    coeffs = np.random.random((num_qubits - 1) * 3)
    
    hamiltonian = 0.0
    coeff_idx = 0
    
    for j in range(num_qubits - 1):
        # XX term
        hamiltonian += coeffs[coeff_idx] * cudaq.spin.x(j) * cudaq.spin.x(j + 1)
        coeff_idx += 1
        # YY term
        hamiltonian += coeffs[coeff_idx] * cudaq.spin.y(j) * cudaq.spin.y(j + 1)
        coeff_idx += 1
        # ZZ term
        hamiltonian += coeffs[coeff_idx] * cudaq.spin.z(j) * cudaq.spin.z(j + 1)
        coeff_idx += 1
    
    return hamiltonian, coeffs


# Hamiltonian 생성
hamiltonian, h_coeffs = create_heisenberg_hamiltonian(num_qubits)
print(f"\nHamiltonian terms: {len(h_coeffs)}")
print(f"Hamiltonian: {hamiltonian}")


Hamiltonian terms: 9
Hamiltonian: (0+0i) + (0.696469+0i) * X0X1 + (0.286139+0i) * Y0Y1 + (0.226851+0i) * Z0Z1 + (0.551315+0i) * X1X2 + (0.719469+0i) * Y1Y2 + (0.423106+0i) * Z1Z2 + (0.980764+0i) * X2X3 + (0.68483+0i) * Y2Y3 + (0.480932+0i) * Z2Z3


In [54]:
# 기댓값 계산
def compute_expectation_standard(params: List[float], 
                                 num_qubits: int,
                                 num_layers: int,
                                 hamiltonian: cudaq.SpinOperator) -> float:
    """
    Standard state vector simulation approach
    """
    @cudaq.kernel
    def ansatz_kernel(params: List[float]):
        qubits = cudaq.qvector(num_qubits)
        hardware_efficient_ansatz(qubits, params, num_layers)
    
    # Compute expectation value
    expval = cudaq.observe(ansatz_kernel, hamiltonian, params).expectation()
    return expval


# 실행
start_time = time.time()
expval_standard = compute_expectation_standard(test_params, num_qubits, num_layers, hamiltonian)
elapsed = time.time() - start_time

print(f"\n=== Standard Approach ===")
print(f"Expectation value: {expval_standard:.6f}")
print(f"Computation time: {elapsed*1000:.2f} ms")


=== Standard Approach ===
Expectation value: 1.130890
Computation time: 83.08 ms


## Pauli Propagation Implementation (CUDA-Q Style)

Heisenberg picture를 사용한 효율적인 구현을 시도합니다.

**참고**: CUDA-Q는 주로 상태 벡터 시뮬레이션에 최적화되어 있으므로, 
Pauli propagation은 Python 레벨에서 구현하고 필요한 경우에만 CUDA-Q 커널을 호출합니다.

In [55]:
class PauliWord:
    """Simple Pauli word representation"""
    def __init__(self, operators: Dict[int, str]):
        # operators: {qubit_idx: 'I'/'X'/'Y'/'Z'}
        self.ops = {k: v for k, v in operators.items() if v != 'I'}
    
    def weight(self) -> int:
        """Number of non-identity operators"""
        return len(self.ops)
    
    def get(self, qubit: int) -> str:
        return self.ops.get(qubit, 'I')
    
    def __repr__(self):
        return f"PauliWord({self.ops})"


class PauliSentence:
    """Collection of Pauli words with coefficients"""
    def __init__(self):
        self.terms = {}  # {frozenset(ops.items()): coeff}
    
    def add_term(self, pauli_word: PauliWord, coeff: complex):
        key = frozenset(pauli_word.ops.items())
        self.terms[key] = self.terms.get(key, 0) + coeff
    
    def items(self):
        for key, coeff in self.terms.items():
            yield PauliWord(dict(key)), coeff
    
    def __repr__(self):
        return f"PauliSentence with {len(self.terms)} terms"


print("\nPauli representation classes defined.")


Pauli representation classes defined.


In [56]:
# CNOT lookup table (same as PennyLane version)
CNOT_TABLE = {
    ("I", "I"): (("I", "I"), 1),
    ("I", "X"): (("I", "X"), 1),
    ("I", "Y"): (("Z", "Y"), 1),
    ("I", "Z"): (("Z", "Z"), 1),
    ("X", "I"): (("X", "X"), 1),
    ("X", "X"): (("X", "I"), 1),
    ("X", "Y"): (("Y", "Z"), 1),
    ("X", "Z"): (("Y", "Y"), -1),
    ("Y", "I"): (("Y", "X"), 1),
    ("Y", "X"): (("Y", "I"), 1),
    ("Y", "Y"): (("X", "Z"), -1),
    ("Y", "Z"): (("X", "Y"), 1),
    ("Z", "I"): (("Z", "I"), 1),
    ("Z", "X"): (("Z", "X"), 1),
    ("Z", "Y"): (("I", "Y"), 1),
    ("Z", "Z"): (("I", "Z"), 1),
}


def apply_cnot_propagation(control: int, target: int, 
                           H: PauliSentence, k: int = None) -> PauliSentence:
    """Apply CNOT in Heisenberg picture"""
    new_H = PauliSentence()
    
    for pauli_word, coeff in H.items():
        op_c = pauli_word.get(control)
        op_t = pauli_word.get(target)
        
        (new_op_c, new_op_t), factor = CNOT_TABLE[(op_c, op_t)]
        
        new_ops = pauli_word.ops.copy()
        if new_op_c != 'I':
            new_ops[control] = new_op_c
        elif control in new_ops:
            del new_ops[control]
        
        if new_op_t != 'I':
            new_ops[target] = new_op_t
        elif target in new_ops:
            del new_ops[target]
        
        new_pw = PauliWord(new_ops)
        
        if k is None or new_pw.weight() <= k:
            new_H.add_term(new_pw, factor * coeff)
    
    return new_H


print("CNOT propagation function defined.")

CNOT propagation function defined.


In [57]:
def pauli_commute(pauli1: str, pauli2: str) -> bool:
    """Check if two Pauli operators commute"""
    if pauli1 == 'I' or pauli2 == 'I' or pauli1 == pauli2:
        return True
    return False


def apply_rotation_propagation(pauli_type: str, qubit: int, param: float,
                               H: PauliSentence) -> PauliSentence:
    """Apply rotation gate in Heisenberg picture
    
    R_p(θ)† O R_p(θ) where p ∈ {X, Y, Z}
    
    If [p, O] = 0: unchanged
    If [p, O] ≠ 0: cos(θ)O + sin(θ) * (ip @ O) / i
    """
    new_H = PauliSentence()
    
    for pauli_word, coeff in H.items():
        op_at_qubit = pauli_word.get(qubit)
        
        if pauli_commute(pauli_type, op_at_qubit):
            # Unchanged
            new_H.add_term(pauli_word, coeff)
        else:
            # Split into two terms: cos(θ)O + sin(θ) * new_op
            new_H.add_term(pauli_word, np.cos(param) * coeff)
            
            # Compute the transformed Pauli operator
            # Based on: [Z,X] = 2iY, [Z,Y] = -2iX, etc.
            new_ops = pauli_word.ops.copy()
            
            # Pauli multiplication table for commutator [p, O]
            if pauli_type == 'Z':
                if op_at_qubit == 'X':
                    new_ops[qubit] = 'Y'
                    phase = 1  # [Z,X] = 2iY → Y with +sin
                elif op_at_qubit == 'Y':
                    new_ops[qubit] = 'X'
                    phase = -1  # [Z,Y] = -2iX → X with -sin
            elif pauli_type == 'X':
                if op_at_qubit == 'Y':
                    new_ops[qubit] = 'Z'
                    phase = 1  # [X,Y] = 2iZ
                elif op_at_qubit == 'Z':
                    new_ops[qubit] = 'Y'
                    phase = -1  # [X,Z] = -2iY
            elif pauli_type == 'Y':
                if op_at_qubit == 'Z':
                    new_ops[qubit] = 'X'
                    phase = 1  # [Y,Z] = 2iX
                elif op_at_qubit == 'X':
                    new_ops[qubit] = 'Z'
                    phase = -1  # [Y,X] = -2iZ
            
            new_pw = PauliWord(new_ops)
            # The commutator contributes: sin(θ) * [p, O] / 2i = -i * sin(θ) * phase * new_op
            # In the rotation: e^{-iθp/2} O e^{iθp/2} = cos(θ)O - i*sin(θ) * commutator_result
            new_H.add_term(new_pw, -phase * np.sin(param) * coeff)
    
    return new_H


print("Rotation propagation function defined.")

Rotation propagation function defined.


In [58]:
def initial_state_expectation(H: PauliSentence) -> float:
    """
    Compute <0|H|0> for computational basis state |0>
    Only I and Z operators have non-zero expectation
    """
    expval = 0.0
    for pauli_word, coeff in H.items():
        if all(op in ['I', 'Z'] for op in pauli_word.ops.values()):
            expval += coeff
    return expval


print("Initial state expectation function defined.")

Initial state expectation function defined.


## Full Pauli Propagation Execution

In [59]:
def hamiltonian_to_pauli_sentence(hamiltonian: cudaq.SpinOperator) -> PauliSentence:
    """
    Convert CUDA-Q SpinOperator to PauliSentence
    """
    H = PauliSentence()
    
    # Parse the spin operator
    # For simplicity, manually construct for our Heisenberg model
    # In practice, you'd parse the SpinOperator structure
    
    for j in range(num_qubits - 1):
        coeff_idx = j * 3
        # XX
        pw = PauliWord({j: 'X', j+1: 'X'})
        H.add_term(pw, h_coeffs[coeff_idx])
        # YY
        pw = PauliWord({j: 'Y', j+1: 'Y'})
        H.add_term(pw, h_coeffs[coeff_idx + 1])
        # ZZ
        pw = PauliWord({j: 'Z', j+1: 'Z'})
        H.add_term(pw, h_coeffs[coeff_idx + 2])
    
    return H


# Convert Hamiltonian
H_pauli = hamiltonian_to_pauli_sentence(hamiltonian)
print(f"\nConverted Hamiltonian: {H_pauli}")


Converted Hamiltonian: PauliSentence with 9 terms


In [60]:
def execute_pauli_propagation(params: List[float],
                             num_qubits: int,
                             num_layers: int,
                             H: PauliSentence,
                             k: int = None) -> float:
    """
    Execute full Pauli propagation algorithm
    """
    total_params = num_qubits * 3 * num_layers
    param_idx = total_params - 1  # Reverse order (Heisenberg picture)
    
    # Propagate through circuit layers in reverse
    for layer in range(num_layers - 1, -1, -1):
        # Reverse rotation layer (applied AFTER entanglement in forward pass)
        for i in range(num_qubits - 1, -1, -1):
            for _ in range(3):
                param = params[param_idx]
                H = apply_rotation_propagation('Z', i, param, H)
                param_idx -= 1
        
        # Reverse entanglement layer (applied BEFORE rotations in forward pass)
        if layer < num_layers - 1:
            for i in range(num_qubits - 1, -1, -1):
                control = i
                target = (i + 1) % num_qubits
                H = apply_cnot_propagation(control, target, H, k=k)
    
    # Compute expectation with initial state
    return initial_state_expectation(H)


# Execute with truncation - 더 높은 k 값 사용
k_truncate = 6

start_time = time.time()
expval_pauli = execute_pauli_propagation(test_params, num_qubits, num_layers, H_pauli, k=k_truncate)
elapsed_pauli = time.time() - start_time

print(f"\n=== Pauli Propagation Approach ===")
print(f"Expectation value: {expval_pauli:.6f}")
print(f"Computation time: {elapsed_pauli*1000:.2f} ms")
print(f"Truncation threshold k: {k_truncate}")

print(f"\n=== Comparison ===")
print(f"Standard:          {expval_standard:.6f}")
print(f"Pauli Propagation: {expval_pauli:.6f}")
print(f"Difference:        {abs(expval_standard - expval_pauli):.6f}")
print(f"Speedup:           {elapsed/elapsed_pauli:.2f}x")


=== Pauli Propagation Approach ===
Expectation value: 1.130890
Computation time: 1.43 ms
Truncation threshold k: 6

=== Comparison ===
Standard:          1.130890
Pauli Propagation: 1.130890
Difference:        0.000000
Speedup:           58.14x


## Scaling Test: Larger System

더 큰 시스템에서 성능을 비교합니다.

In [66]:
# Larger system - k 값을 충분히 크게 설정
num_qubits_large = 10
num_layers_large = 3
k_large = 8  # 더 높은 truncation threshold
num_params_large = num_qubits_large * 3 * num_layers_large

np.random.seed(999)
params_large = np.random.random(num_params_large).tolist()

# Create Hamiltonian
hamiltonian_large, h_coeffs_large = create_heisenberg_hamiltonian(num_qubits_large, seed=999)

print(f"\n=== Larger System ===")
print(f"Qubits: {num_qubits_large}")
print(f"Layers: {num_layers_large}")
print(f"Parameters: {num_params_large}")
print(f"Truncation k: {k_large}")
print(f"Initial Hamiltonian weight: 2 (all XX, YY, ZZ terms)")


=== Larger System ===
Qubits: 10
Layers: 3
Parameters: 90
Truncation k: 8
Initial Hamiltonian weight: 2 (all XX, YY, ZZ terms)


In [67]:
# Standard approach (may be slow)
start = time.time()
expval_std_large = compute_expectation_standard(params_large, num_qubits_large, 
                                                num_layers_large, hamiltonian_large)
time_std = time.time() - start

print(f"\nStandard approach:")
print(f"  Expectation: {expval_std_large:.6f}")
print(f"  Time: {time_std*1000:.2f} ms")


Standard approach:
  Expectation: 3.894026
  Time: 92.07 ms


In [68]:
# Pauli propagation
# Convert Hamiltonian
H_pauli_large = PauliSentence()
for j in range(num_qubits_large - 1):
    coeff_idx = j * 3
    H_pauli_large.add_term(PauliWord({j: 'X', j+1: 'X'}), h_coeffs_large[coeff_idx])
    H_pauli_large.add_term(PauliWord({j: 'Y', j+1: 'Y'}), h_coeffs_large[coeff_idx + 1])
    H_pauli_large.add_term(PauliWord({j: 'Z', j+1: 'Z'}), h_coeffs_large[coeff_idx + 2])

start = time.time()
expval_pauli_large = execute_pauli_propagation(params_large, num_qubits_large,
                                               num_layers_large, H_pauli_large, k=k_large)
time_pauli = time.time() - start

print(f"\nPauli propagation:")
print(f"  Expectation: {expval_pauli_large:.6f}")
print(f"  Time: {time_pauli*1000:.2f} ms")

print(f"\n=== Results ===")
print(f"Difference: {abs(expval_std_large - expval_pauli_large):.6f}")
print(f"Speedup: {time_std/time_pauli:.2f}x")


Pauli propagation:
  Expectation: 2.491446
  Time: 15.84 ms

=== Results ===
Difference: 1.402580
Speedup: 5.81x


## 결론 및 주요 발견

### 1. **Pauli Propagation의 정확도**
- Truncation threshold `k` 값이 정확도에 결정적 영향
- Heisenberg 해밀토니안의 초기 weight = 2
- 회로를 통과하면서 weight 증가 → 높은 k 필요
- **권장**: k ≥ 2 × (초기 weight) + (회로 깊이)

### 2. **CUDA-Q vs Pauli Propagation**
- **CUDA-Q 장점**: 
  - GPU 가속으로 대규모 시스템에 강력
  - 정확한 상태 벡터 시뮬레이션
  - 복잡한 관측자에도 robust
  
- **Pauli Propagation 장점**:
  - 적절한 k 선택 시 매우 빠름
  - 메모리 효율적 (상태 벡터 불필요)
  - Sparse Hamiltonian에 특히 유리

### 3. **하이브리드 전략**
- 작은 weight Hamiltonian + 얕은 회로 → Pauli propagation
- 큰 시스템 + 복잡한 회로 → CUDA-Q 상태 벡터
- Gradient 계산: 두 방법 병행 가능

### 향후 작업
- CUDA-Q C++ 커널로 Pauli propagation 직접 구현 (GPU 가속)
- 다중 GPU 활용 (nvidia-mgpu 타겟)
- Parameter-shift rule과 결합한 gradient 계산
- Variational algorithm (VQE, QAOA) 적용

## Truncation 효과 분석

다양한 k 값에 대해 정확도와 성능을 비교합니다.

4

In [69]:
# 작은 시스템에서 다양한 k 값 테스트
print("=== Truncation Effect Analysis (Small System) ===\n")

k_values = [3, 4, 5, 6, 8, None]  # None = no truncation
num_qubits = 10
results = []
for k_val in k_values:
    start = time.time()
    exp_val = execute_pauli_propagation(test_params, num_qubits, num_layers, H_pauli, k=k_val)
    elapsed = time.time() - start
    
    error = abs(expval_standard - exp_val)
    results.append({
        'k': k_val if k_val is not None else '∞',
        'expval': exp_val,
        'error': error,
        'time_ms': elapsed * 1000
    })
    
    print(f"k={str(k_val if k_val is not None else '∞'):>3} | "
          f"Expval: {exp_val:>10.6f} | "
          f"Error: {error:>10.6e} | "
          f"Time: {elapsed*1000:>6.2f} ms")

print(f"\nReference (Standard): {expval_standard:.6f}")
print(f"\nObservation: k 값이 클수록 정확도가 향상되지만 계산 시간도 증가합니다.")

=== Truncation Effect Analysis (Small System) ===



IndexError: list index out of range