# Optimizers and Gradients

Many quantum algorithms require the optimization of quantum circuit parameters with respect to an expectation value. CUDA-Q provides a comprehensive suite of optimization tools for hybrid quantum-classical algorithms like VQE (Variational Quantum Eigensolver).

This notebook will demonstrate:

1. **Built-in CUDA-Q Optimizers**: Adam, SGD, SPSA, COBYLA, NelderMead, LBFGS, and GradientDescent
2. **Optimizer Parameters**: Detailed configuration options with defaults and tuning guidance
3. **Gradient Strategies**: CentralDifference, ForwardDifference, and ParameterShift
4. **Third-Party Optimizers**: Integration with SciPy
5. **Parallel Parameter Shift**: Multi-GPU gradient computation

## CUDA-Q Optimizer Overview

CUDA-Q includes the following optimizers:

### Gradient-Free Optimizers (no gradients required):
- **COBYLA**: Constrained Optimization BY Linear Approximations
- **NelderMead**: Simplex-based derivative-free optimizer
- **SPSA**: Simultaneous Perturbation Stochastic Approximation (excellent for noisy functions)

### Gradient-Based Optimizers (require gradients):
- **Adam**: Adaptive Moment Estimation with momentum (recommended for most cases)
- **SGD**: Stochastic Gradient Descent
- **LBFGS**: Limited-memory BFGS quasi-Newton method
- **GradientDescent**: Basic gradient descent

First, let's set up the kernel and Hamiltonian that we'll use throughout the examples.

In [1]:
import cudaq
from cudaq import spin
import numpy as np

hamiltonian = 5.907 - 2.1433 * spin.x(0) * spin.x(1) - 2.1433 * spin.y(
    0) * spin.y(1) + .21829 * spin.z(0) - 6.125 * spin.z(1)

@cudaq.kernel
def kernel(angles: list[float]):
    qubits = cudaq.qvector(2)
    x(qubits[0])
    ry(angles[0], qubits[1])
    x.ctrl(qubits[1], qubits[0])  

initial_params = np.random.normal(0, np.pi, 1)

## 1. Built-in CUDA-Q Optimizers and Gradients

CUDA-Q provides several optimizers with configurable parameters. Let's explore the most commonly used optimizers: **Adam**, **SGD**, and **SPSA**.

### 1.1 Adam Optimizer with Parameter Configuration

**Adam (Adaptive Moment Estimation)** combines momentum and adaptive learning rates for efficient optimization. It's particularly effective for problems with noisy gradients.

**Configurable Parameters:**
- `step_size` (default: 0.01): Learning rate for parameter updates
- `beta1` (default: 0.9): Exponential decay rate for first moment (momentum)
- `beta2` (default: 0.999): Exponential decay rate for second moment (adaptive learning)
- `epsilon` (default: 1e-8): Small constant for numerical stability
- `batch_size` (default: 1): Number of samples per batch
- `f_tol` (default: 1e-4): Convergence tolerance
- `max_iterations`: Maximum number of iterations
- `initial_parameters`: Starting parameter values

The optimizer and gradient are specified below. An objective function is defined which uses a lambda expression to evaluate the cost (a CUDA-Q `observe` expectation value). The gradient is calculated using the `compute` method.

In [2]:
# Configure Adam optimizer with custom parameters
optimizer = cudaq.optimizers.Adam()
optimizer.step_size = 0.1                      # Learning rate
optimizer.beta1 = 0.9                          # First moment decay
optimizer.beta2 = 0.999                        # Second moment decay
optimizer.epsilon = 1e-8                       # Numerical stability
optimizer.max_iterations = 100                 # Maximum iterations
optimizer.initial_parameters = initial_params  # Set initial parameters

# Use CentralDifference gradient strategy
gradient = cudaq.gradients.CentralDifference()

def objective_function(parameter_vector: list[float],
                       hamiltonian=hamiltonian,
                       gradient_strategy=gradient,
                       kernel=kernel) -> tuple[float, list[float]]:
    """
    Objective function for gradient-based optimizers.
    Returns: (cost, gradient_vector)
    """
    get_result = lambda parameter_vector: cudaq.observe(kernel, hamiltonian, parameter_vector).expectation()

    cost = get_result(parameter_vector)
    
    gradient_vector = gradient_strategy.compute(parameter_vector, get_result, cost)

    return cost, gradient_vector

Now run the optimizer to find the optimal energy and parameters. Adam will use adaptive learning rates for each parameter.

In [3]:
energy, parameter = optimizer.optimize(dimensions=1, function=objective_function)

print(f"\n=== Adam Optimizer Results ===")
print(f"Minimized <H> = {energy:.6f}")
print(f"Optimal parameters: {[round(p, 6) for p in parameter]}")


=== Adam Optimizer Results ===
Minimized <H> = -1.744713
Optimal parameters: [-5.721116]


### 1.2 SGD (Stochastic Gradient Descent) Optimizer

**SGD** is a fundamental optimization algorithm that updates parameters by taking steps proportional to the negative gradient.

**Configurable Parameters:**
- `step_size` (default: 0.01): Learning rate for parameter updates
- `batch_size` (default: 1): Number of samples per batch
- `f_tol` (default: 1e-4): Convergence tolerance
- `max_iterations`: Maximum number of iterations
- `initial_parameters`: Starting parameter values

SGD is simpler than Adam and can be effective when you understand your problem well enough to tune the learning rate appropriately.

In [4]:
# Configure SGD optimizer
sgd_optimizer = cudaq.optimizers.SGD()
sgd_optimizer.step_size = 0.05       # Learning rate
sgd_optimizer.batch_size = 1         # Stochastic mode
sgd_optimizer.max_iterations = 100   # Maximum iterations
sgd_optimizer.f_tol = 1e-6           # Convergence tolerance
sgd_optimizer.initial_parameters = initial_params

# Run optimization
sgd_energy, sgd_params = sgd_optimizer.optimize(dimensions=1, function=objective_function)

print(f"\n=== SGD Optimizer Results ===")
print(f"Minimized <H> = {sgd_energy:.6f}")
print(f"Optimal parameters: {[round(p, 6) for p in sgd_params]}")


=== SGD Optimizer Results ===
Minimized <H> = -1.748865
Optimal parameters: [-5.688733]


### 1.3 SPSA (Simultaneous Perturbation Stochastic Approximation)

**SPSA** is a gradient-free stochastic optimization algorithm that is particularly useful for noisy objective functions (like quantum hardware with shot noise). It approximates gradients using simultaneous perturbations and requires only **2 function evaluations per iteration** regardless of problem dimension.

**Configurable Parameters:**
- `step_size` (default: 0.3): Evaluation step size for gradient approximation
- `gamma` (default: 0.101): Scaling exponent for step size schedule
- `max_iterations`: Maximum number of iterations
- `initial_parameters`: Starting parameter values

**Key Advantage**: SPSA does **not** require gradients, making it ideal for noisy functions and quantum hardware.

In [5]:
# Configure SPSA optimizer
spsa_optimizer = cudaq.optimizers.SPSA()
spsa_optimizer.step_size = 0.3       # Evaluation step size
spsa_optimizer.gamma = 0.101         # Scaling exponent
spsa_optimizer.max_iterations = 100  # Maximum iterations
spsa_optimizer.initial_parameters = initial_params

# Define gradient-free objective function
def spsa_objective(parameter_vector: list[float]) -> float:
    """
    Objective function for gradient-free optimizers like SPSA.
    Returns: cost only (no gradient)
    """
    return cudaq.observe(kernel, hamiltonian, parameter_vector).expectation()

# Run optimization
spsa_energy, spsa_params = spsa_optimizer.optimize(dimensions=1, function=spsa_objective)

print(f"\n=== SPSA Optimizer Results ===")
print(f"Minimized <H> = {round(spsa_energy, 6)}")
print(f"Optimal parameters: {[round(p, 6) for p in spsa_params]}")


=== SPSA Optimizer Results ===
Minimized <H> = -1.748668
Optimal parameters: [-5.681724]


## 2. Third-Party Optimizers

CUDA-Q optimizers can work alongside third-party optimization libraries like SciPy. This provides flexibility to use familiar optimization tools while leveraging CUDA-Q's quantum simulation capabilities.

The same VQE procedure can be accomplished using SciPy. In this case, a simple cost function is defined and provided as input to the standard SciPy `minimize` function.

In [6]:
from scipy.optimize import minimize

def cost(theta):

    exp_val = cudaq.observe(kernel, hamiltonian, theta).expectation()

    return exp_val

result = minimize(cost, initial_params ,method='COBYLA', options={'maxiter': 40})
print(result)

 message: Optimization terminated successfully.
 success: True
  status: 1
     fun: -1.748865011330396
       x: [ 5.943e-01]
    nfev: 26
   maxcv: 0.0


## 3. Parallel Parameter Shift Gradients

CUDA-Q's `mqpu` backend allows for parallel computation of parameter shift gradients using multiple simulated QPUs. Gradients computed this way can be used in any of the previously discussed optimization procedures.  Below is an example demonstrating how parallel gradient evaluation can be used for a VQE procedure. 

The parameter shift procedure computes two expectations values for each parameter shifted forwards and backwards. These are used to estimate the gradient contribution for that parameter.

The following code defines a function that takes a kernel, a Hamiltonian (spin operator), and the circuit parameters and produces a parameter shift gradient with shift `epsilon`. The first step of the function builds `xplus` and `xminus` , arrays consisting of the shifted parameters. 

Next, a for loop iterates over all of the parameters and uses the `cudaq.observe_async` to compute the expectation value.  This command also takes `qpu_id` as an in out which specifies the GPU that will be used to simulate the ith QPU.  In the example below, four GPUs (simulated QPUs) are available so the gradient is batched over four devices. 

The results are saved in the `g_plus` and `g_minus` lists, the elements of which are accessed with commands like  `g_plus[1].expectation()` to compute the finite differences and construct the final gradient. 


In [7]:
import  numpy as np
# cudaq.set_target('nvidia', option = 'mqpu')

num_qpus = 1
epsilon =np.pi/4


def batched_gradient_function(kernel, parameters, hamiltonian, epsilon): 

    # Prepare an array of parameters corresponding to the plus and minus shifts
    x = np.tile(parameters, (len(parameters),1))
    xplus = x + (np.eye(x.shape[0]) * epsilon)
    xminus = x - (np.eye(x.shape[0]) * epsilon)

    g_plus = []
    g_minus = []
    gradient = []

    qpu_counter = 0 # Iterate over the number of GPU resources available
    
    
    for i in range(x.shape[0]): 

        g_plus.append(cudaq.observe_async(kernel,hamiltonian, xplus[i], qpu_id = qpu_counter%num_qpus))
        qpu_counter += 1 

        g_minus.append(cudaq.observe_async(kernel, hamiltonian, xminus[i], qpu_id = qpu_counter%num_qpus))
        qpu_counter += 1 
        
    # Use the expectation values to compute the gradient    
    gradient = [(g_plus[i].get().expectation() - g_minus[i].get().expectation()) / (2*epsilon) for i in range(len(g_minus))]

    return gradient


This function can be used in a VQE procedure as presented below. The `batched_gradient_function` is used to evaluate the gradient at each optimization step. This objective function returns the cost and gradient at the current parameter values and can be used with any SciPy optimizer that uses gradients (like L-BFGS-B).

In [8]:
def objective_function(parameter_vector,
                       hamiltonian=hamiltonian,
                       kernel=kernel,
                       epsilon=epsilon):
    """
    Objective function for VQE with parallel parameter shift gradients.
    Computes both cost and gradient at the current parameter values.
    """
    # Compute cost at current parameters
    cost = cudaq.observe(kernel, hamiltonian, parameter_vector).expectation()
    
    # Compute gradient at current parameters using parallel parameter shift
    gradient_vector = batched_gradient_function(kernel, parameter_vector, hamiltonian, epsilon)

    return cost, gradient_vector


In [9]:
# Run VQE optimization with parallel parameter shift gradients
result_vqe = minimize(objective_function, initial_params, method='L-BFGS-B', jac=True, tol=1e-8, options={'maxiter': 50})

print("\n=== VQE with Parallel Parameter Shift Gradients ===")
print(f"Optimized energy: {result_vqe.fun:.6f}")
print(f"Optimal parameters: {result_vqe.x}")
print(f"Number of iterations: {result_vqe.nit}")
print(f"Success: {result_vqe.success}")