# Escaping the Barren Plateau

**QOSF Monthly Challenge - [Jun 2025]**

**Author:** Tan Jun Liang

---

## üìÑ Abstract

### **Introduction**

Variational Quantum Algorithms (VQAs) are a leading strategy for unlocking the power of near-term quantum computers for problems in optimization, machine learning, and chemistry. VQAs work by using a classical optimizer to train the parameters of a quantum circuit, much like a classical neural network.

However, as we scale up our quantum models to more qubits, we run into a devastating problem: the **Barren Plateau**. Imagine trying to find the lowest point in a vast, perfectly flat desert. If there's no slope, you don't know which way to go. For a VQA, the "landscape" of the optimization problem can become exponentially flat as the number of qubits increases, causing the training gradients to vanish. An optimizer with a zero-gradient is lost in the desert, and the training completely fails.

This phenomenon is a critical barrier to scaling quantum machine learning. In this challenge, your mission is to witness, diagnose, and overcome the barren plateau.


## üìö 1. The Theory of Barren Plateaus: A Review

The practical utility of any VQA hinges on the ability of a classical optimizer to successfully train a Parameterized Quantum Circuit (PQC). The discovery of barren plateaus revealed a fundamental obstacle to this process. Research has identified several distinct mechanisms that can lead to a flat, untrainable optimization landscape.

<img src="images/output.png" width="600">

### 1.1 Depth-Induced Barren Plateaus

The original work by **McClean et al. [1]** identified that deep PQCs with random parameter initializations form approximate "2-designs." A 2-design is a distribution of unitary transformations that mimics the statistical properties of the full space of all possible unitaries (the Haar measure) up to the second moment. This high degree of scrambling effectively randomizes the output, causing the expectation value of any observable to concentrate exponentially around a fixed value (typically zero), leading to vanishing gradients.

### 1.2 Cost Function-Induced Barren Plateaus

A subsequent discovery by **Cerezo et al. [2]** showed that barren plateaus can exist even in shallow circuits if the cost function is sufficiently *global*. A global cost function involves observables that act non-trivially on many qubits (e.g., measuring the parity of all qubits, `ZZ...Z`). Due to the phenomenon of concentration of measure, the expectation values of such observables also concentrate exponentially, leading to vanishing gradients independent of circuit depth. Conversely, **local cost functions**, which measure only a few qubits, are immune to this specific mechanism.

### 1.3 Noise-Induced Barren Plateaus (NIBPs)

Even a VQA with a shallow circuit and a local cost function can fail if subjected to sufficient noise. As shown by **Wang et al. [3]**, the presence of global quantum noise (i.e., noise channels acting on all qubits) can itself induce a barren plateau. The noise effectively contracts the reachable state space towards the maximally mixed state, flattening the landscape and destroying the gradient, a phenomenon termed Noise-Induced Barren Plateaus (NIBPs).

This notebook will empirically investigate all three phenomena.

## üî¨ 2. The Challenge: A Unified Experiment Runner

To make this a more interesting challenge and avoid replicating code, your primary task is to complete a single, powerful `run_experiment` function. This function will be capable of testing all our hypotheses about barren plateaus by taking arguments that control the experimental conditions.

### **Setup and Your Main Task**

**Your Task:** Complete the `run_experiment` function below by filling in the sections marked `--- YOUR CODE HERE ---`.

In [None]:
# --- Cell 1: Setup and Your Main Task ---

import pennylane as qml
from pennylane import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

In [None]:
# --- Experiment Parameters ---
QUBIT_COUNTS = range(2, 10, 2)
N_TRIALS = 50

# --- The Correct PQC Definition (Based on the Paper) ---
dev = qml.device('default.qubit', wires=max(QUBIT_COUNTS))

@qml.qnode(dev)
def pqc(params, observable, n_qubits, depth):
    """
    A PQC that faithfully follows the recipe from McClean et al. (2018).
    """
    wires = list(range(n_qubits))
    
    # --- Step 1: Initial, non-parameterized symmetry-breaking layer ---
    for i in wires:
        qml.RY(np.pi / 4, wires=i)
    
    # --- Step 2: Layered, parameterized ansatz ---
    for d in range(depth):
        # Layer of single-qubit rotations
        for i in wires:
            qml.Rot(*params[d, i], wires=i)
        # Layer of entangling gates
        for i in range(n_qubits - 1):
            qml.CNOT(wires=[i, i + 1])
            
    return qml.expval(observable)

# --- Plotting function ---
def plot_all_variances(results, title):
    plt.style.use('seaborn-v0_8-whitegrid')
    plt.figure(figsize=(12, 7))
    
    plot_styles = {
        'Shallow / Local': {'color': 'blue', 'label': 'Part 1: Shallow, Local Cost (Trainable)'},
        'Deep / Local': {'color': 'red', 'label': 'Part 2: Deep, Local Cost (Depth-Induced BP)'},
        'Shallow / Global': {'color': 'green', 'label': 'Part 3: Shallow, Global Cost (Cost-Induced BP)'}
    }
    
    for key, variances in results.items():
        style = plot_styles[key]
        plt.plot(QUBIT_COUNTS, variances, 'o-', color=style['color'], label=style['label'])

    plt.yscale('log')
    plt.title(title, fontsize=16)
    plt.xlabel('Number of Qubits', fontsize=14)
    plt.ylabel('Gradient Variance (log scale)', fontsize=14)
    plt.legend(fontsize=12)
    plt.grid(True, which='both')
    plt.show()

print("Setup Complete. This version is a faithful replication of the paper's method.")

### **Running the Experiments & Visualizing the Results**

**Your Task:** Once you have completed the function above, run this cell to execute all three core experiments and plot their results on a single graph for comparison.

In [None]:
# --- Cell 2: The Experiment Logic ---

def run_experiment(cost_type, depth_rule):
    """
    Runs one full, correct experiment to measure gradient variance.
    
    Args:
        cost_type (str): 'local' or 'global'.
        depth_rule (str): 'deep' (scales with n_qubits) or 'shallow'.
    """
    variances = []
    
    for n_qubits in tqdm(QUBIT_COUNTS, desc=f"Testing {depth_rule}/{cost_type}"):
        
        # --- YOUR TASK 1: Determine circuit depth based on the rule ---
        # If depth_rule is 'deep', depth should equal n_qubits.
        # Otherwise, depth should be shallow_depth.
        # --- YOUR CODE HERE ---
        depth = 0

        # --- YOUR TASK 2: Define the observable ---
        # If cost_type is 'local', use PauliZ(0).
        # If cost_type is 'global', use the tensor product of PauliZ on all qubits.
        # hint: use qml.prod
        # --- YOUR CODE HERE ---
        observable = None
            
        # --- YOUR TASK 3: Create the gradient function ---
        # Ensure the correct 'depth' variable is passed to the PQC.
        # hint: use qml.grad and lambda function
        # --- YOUR CODE HERE ---
        grad_fn = None
        
        gradients = []
        for _ in range(N_TRIALS):
            # Always use wide initialization for a fair statistical sample.
            params_shape = (depth, n_qubits, 3)
            params = np.random.uniform(0, 2 * np.pi, size=params_shape)

            grad_val = grad_fn(params)[0, 0, 0] # Grad wrt first param
            gradients.append(grad_val)
            
        variances.append(np.var(gradients))
        
    return np.array(variances)

print("experiment logic is ready. Proceed to the final cell.")

In [None]:
# --- Cell 3: Run All Tasks and See the Results ---

# This dictionary holds the results for our three main hypotheses.
results = {
    # Task 1: Our baseline. Should be trainable (high, flat variance).
    'Shallow / Local': 
        run_experiment(cost_type='local', depth_rule='shallow'),
    
    # Task 2: McClean et al. Should be a barren plateau (exponential decay).
    'Deep / Local': 
        run_experiment(cost_type='local', depth_rule='deep'),
        
    # Task 3: Cerezo et al. Should be a barren plateau (exponential decay).
    'Shallow / Global': 
        run_experiment(cost_type='global', depth_rule='shallow')
}

# --- Plot the results ---
title = "Demonstration: Gradient Variance vs. Qubits"
plot_all_variances(results, title)

### 3 Investigating Noise-Induced Barren Plateaus

Finally, we investigate the NIBP phenomenon described by **Wang et al. [3]**. We will return to our "solved" setup from Task 1 (narrow init, local cost) but add a global depolarizing noise channel to our simulator.

**Hypothesis:** The presence of global noise will induce a barren plateau, causing our previously successful training to fail.

**Challenge:** Implement a simple noise model and show that the training fails.

In [None]:
# --- TASK 1: Define a noisy device and the noise strength ---
# A small but significant probability of a depolarizing error on each qubit.
NOISE_STRENGTH = 0.05 
# We must use a device that can simulate noise (density matrices).
dev_noisy = qml.device("default.mixed", wires=max(QUBIT_COUNTS))


# --- TASK 2: Define a new, noisy PQC ---
# YOUR TASK: This QNode should be executed on the noisy device.
# It should be identical to the pqc in Cell 1, but with an added
# layer of depolarizing noise at the end of each loop in the ansatz.
@qml.qnode(dev_noisy)
def pqc_noisy(params, observable, n_qubits, depth):
    """
    A PQC with a global depolarizing channel applied after each layer.
    """
    wires = list(range(n_qubits))
    
    # Initial, non-parameterized symmetry-breaking layer
    for i in wires:
        qml.RY(np.pi / 4, wires=i)
    
    # Main ansatz loop
    for d in range(depth):
        # Parameterized layer
        for i in wires:
            qml.Rot(*params[d, i], wires=i)
        # Entangling layer
        for i in range(n_qubits - 1):
            qml.CNOT(wires=[i, i + 1])
            
        # --- Add a layer of global noise here ---
        # Add a qml.DepolarizingChannel with NOISE_STRENGTH to each qubit.
        # --- YOUR CODE HERE ---
            
    return qml.expval(observable)


# --- TASK 3: Run the NIBP experiment ---
# We will reuse the logic, but call the new noisy PQC.
def run_noisy_experiment(cost_type, depth_rule):
    variances = []
    for n_qubits in tqdm(QUBIT_COUNTS, desc=f"Testing Noisy {depth_rule}/{cost_type}"):
        depth = 0
        observable = None
        
        # Use the noisy PQC for this gradient calculation
        grad_fn = None
        
        gradients = []
        for _ in range(N_TRIALS):
            params = np.random.uniform(0, 2 * np.pi, size=(depth, n_qubits, 3))
            grad_val = grad_fn(params)[0, 0, 0]
            gradients.append(grad_val)
        variances.append(np.var(gradients))
    return np.array(variances)

# Run the experiment on our "best-case" scenario.
variances_nibp = run_noisy_experiment(cost_type='local', depth_rule='shallow')


# --- Plot the NIBP result against the successful baseline case from Cell 3 ---
# Retrieve the successful baseline results for comparison
trainable_variances = results['Shallow / Local']

plt.style.use('seaborn-v0_8-whitegrid')
plt.figure(figsize=(12, 7))

plt.plot(QUBIT_COUNTS, trainable_variances, 'o-', color='blue', label='Part 1: Trainable Landscape (Noiseless)')
plt.plot(QUBIT_COUNTS, variances_nibp, 'o-', color='purple', label='Part 4: NIBP (Shallow, Local Cost, with Noise)')

plt.yscale('log')
plt.title('Demonstration: Noise-Induced Barren Plateau', fontsize=16)
plt.xlabel('Number of Qubits', fontsize=14)
plt.ylabel('Gradient Variance (log scale)', fontsize=14)
plt.legend(fontsize=12)
plt.grid(True, which='both')
plt.show()

### 3.1 Bonus: Try a different quantum circuit setup to make this result better

## üöÄ 4. Bonus Challenge: Algorithmic Mitigation with ADAPT-VQE

A highly effective, state-of-the-art mitigation strategy is to not use a fixed ansatz at all, but to build one iteratively. The **ADAPT-VQE** algorithm, introduced by **Grimsley et al.**, does exactly this. It avoids barren plateaus by design because it only ever adds operators that are guaranteed to have a large gradient, keeping the circuit depth to a minimum.

**Algorithm:**
1.  Define a "pool" of operators (e.g., all possible Pauli strings).
2.  Start with a simple reference state (e.g., the `|0...0>` state).
3.  **Loop:**
    a. Calculate the gradient of the cost function with respect to each operator in the pool.
    b. Select the operator with the largest gradient magnitude. This is the direction of steepest descent.
    c. Add this operator to the end of the current ansatz.
    d. Optimize all parameters in the newly extended ansatz.
4.  Repeat until the largest gradient in the pool is below a threshold.

Implementing this is a significant but rewarding challenge that demonstrates a true solution to the problem.

## üèÅ 5. Conclusion

This investigation empirically confirmed the primary theoretical causes of barren plateaus in VQAs. We demonstrated that deep, randomly initialized circuits and the use of global cost functions are significant obstacles to trainability. Furthermore, we showed that even an otherwise well-behaved VQA can be rendered untrainable by the presence of global hardware noise, confirming the existence of NIBPs.

Simple mitigation strategies, such as narrow parameter initialization and the use of local observables, proved effective against their corresponding barren plateau mechanisms. For a robust VQA, it is clear that a holistic approach, considering all potential pitfalls from ansatz design to cost function definition and noise resilience, is paramount. More advanced, adaptive methods like ADAPT-VQE offer a promising path forward for constructing scalable and trainable models.

### Write summary

In the markdown cell below, please describe your findings.
*   Which mitigation strategy performed the best and why?
*   Why did the global cost function fail to train, even with good initialization?
*   What are the trade-offs for each method (e.g., implementation complexity)?
*   Did you try the bonus challenge or any other creative ideas? If so, what were they and how did they perform?

---

*... Your analysis here ...*


| Strategy                        | Implementation Complexity(Trivial/High) | Performance(Failed/Success) | Key Takeaway                                  |
| ------------------------------- | --------------------------------------- | --------------------------- | --------------------------------------------- |
| Random Init / Local Cost        | (write here)                            | (write here)                | (write here)                                  |
| Narrow Init / Local Cost        | (write here)                            | (write here)                | (write here)                                  |
| Narrow Init / Global Cost       | (write here)                            | (write here)                | (write here)                                  |
| Layer-by-Layer Training (Bonus) | (write here)                            | (write here)                | (write here)                                  |

## üìñ 6. References

 Wang, S., et al. "Noise-induced barren plateaus in variational quantum algorithms." *Nature Communications* 12.1 (2021): 6961.

 Cerezo, M., et al. "Cost function dependent barren plateaus in shallow parametrized quantum circuits." *Nature Communications* 12.1 (2021): 1791.

 Grimsley, H. R., et al. "An adaptive variational algorithm for exact molecular simulations on a quantum computer." *Nature Communications* 10.1 (2019): 3007.

 McClean, J. R., et al. "Barren plateaus in quantum neural network training landscapes." *Nature Communications* 9.1 (2018): 4812.