In [None]:
import tensorflow as tf
import numpy as np
import random
import time
from math import sqrt

try:
    from config import config
    from DilatedRNN import DilatedRNNWavefunction
    from utils import Fullyconnected_localenergies, Fullyconnected_diagonal_matrixelements
    from vca import vca_solver
    print("Successfully imported local modules.")
except ImportError:
    print("Local .py files not found. Please ensure they are in the same directory or paste the classes below.")

tf.compat.v1.disable_eager_execution()

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Successfully imported local modules.


# Section 1: Ising Model as Energy Minimization

## 1.1 What is the Ising Model?
The Ising model was originally developed in statistical physics to describe the **ferromagnetism** of solid materials. It assumes that a system consists of a collection of **spins** arranged on a lattice, where each spin can only take one of two discrete values:
* $s_i = +1$ (Spin Up)
* $s_i = -1$ (Spin Down)

In this implementation, the spins are mapped to binary values $s \in \{0, 1\}$ to be compatible with neural network processing.

## 1.2 Problem Type: Combinatorial Optimization
From a computer science perspective, the Ising model is a classic **Combinatorial Optimization Problem**:
* **Goal**: To find a specific arrangement among $2^N$ possible discrete configurations that minimizes the total energy of the system.
* **Applications**: Many famous NP-hard problems, such as the **Max-Cut Problem**, **Traveling Salesperson Problem (TSP)**, or **Graph Coloring**, can be exactly mapped onto the energy minimization of an Ising model.

## 1.3 Mathematical Nature & The Hamiltonian
The **Hamiltonian** is a function in physics that represents the total energy of a system. For a fully connected Ising model, the energy function $E(\mathbf{s})$ is defined as:

$$E(\mathbf{s}) = -\sum_{i < j} J_{ij} \sigma_i \sigma_j - \sum_i h_i \sigma_i$$

* **$J_{ij}$ (Interaction Term)**: Defines the coupling strength between spins $i$ and $j$.
    * If $J_{ij} > 0$: Spins tend to align in the same direction (Ferromagnetic).
    * If $J_{ij} < 0$: Spins tend to align in opposite directions (Anti-ferromagnetic).
* **$h_i$ (External Field)**: Represents the influence of an external magnetic field or bias on an individual spin. In this code, a transverse field `Bx` is introduced to add non-diagonal perturbations.
* **Code Example**: In `utils.py`, the `Jz` matrix stores all interaction weights, and the function `Fullyconnected_diagonal_matrixelements` calculates the classical energy based on these weights.

In [5]:
def Fullyconnected_diagonal_matrixelements(Jz, samples):
    numsamples = samples.shape[0]
    N = samples.shape[1]
    energies = np.zeros((numsamples), dtype = np.float64)

    for i in range(N-1):
      values = np.expand_dims(samples[:,i], axis = -1)+samples[:,i+1:]
      valuesT = np.copy(values)
      valuesT[values==2] = +1 #If both spins are up
      valuesT[values==0] = +1 #If both spins are down
      valuesT[values==1] = -1 #If they are opposite

      energies += np.sum(valuesT*(-Jz[i,i+1:]), axis = 1)

    return energies

## 1.4 The Essence of NP-hardness
Finding the lowest energy state (the **Ground State**) of an Ising model is extremely difficult for the following reasons:

1.  **Exponential State Space**: For $N$ spins, there are $2^N$ possible states. Brute-force search becomes impossible as $N$ grows.
2.  **Frustration**: When the distribution of $J_{ij}$ is complex (as in Spin Glass models), the system cannot satisfy all coupling terms simultaneously.
3.  **Complex Energy Landscape**: Frustration leads to a highly non-convex energy landscape filled with numerous **Local Minima**.
4.  **Mathematical Bottleneck**: Due to these high energy barriers, traditional optimization algorithms easily get stuck in local optima, failing to find the global optimum. This is the core of its NP-hardness.

# Section 2: From Simulated Annealing to Variational Formulation

## 2.1 The Boltzmann Distribution
In statistical mechanics, a system at thermal equilibrium at temperature $T$ is described by the **Boltzmann distribution**. The probability of the system being in a specific state $\mathbf{s}$ is given by:

$$P_B(\mathbf{s}) = \frac{e^{-E(\mathbf{s})/T}}{Z}$$

* **$E(\mathbf{s})$**: The energy of configuration $\mathbf{s}$.
* **$T$**: The temperature (effectively controlling the "noise" level).
* **$Z$**: The partition function $Z = \sum_{\mathbf{s}} e^{-E(\mathbf{s})/T}$, which ensures the total probability sums to 1.

**Physical Intuition**: 
* At **high temperatures**, $P_B(\mathbf{s})$ becomes nearly uniform, allowing the system to explore the state space freely.
* At **low temperatures** ($T \to 0$), the distribution concentrates all its probability mass on the **Ground State** (the global minimum of energy).

## 2.2 Motivation: Why Variational?
Traditional **Simulated Annealing (SA)** relies on Markov Chain Monte Carlo (MCMC) sampling. However, SA faces significant challenges:
1.  **Mixing Time**: In complex energy landscapes (like Spin Glasses), MCMC can get trapped in local minima for an exponentially long time.
2.  **Normalization**: Calculating the partition function $Z$ is computationally intractable for large systems.

**The Variational Solution**:
Instead of using MCMC to sample from the unknown $P_B(\mathbf{s})$, we introduce a **Variational Distribution** $q_\theta(\mathbf{s})$ (represented by our Neural Network). We then optimize the parameters $\theta$ to make $q_\theta(\mathbf{s})$ as close as possible to the target Boltzmann distribution $P_B(\mathbf{s})$.

## 2.3 The Objective: Minimizing Free Energy
The "closeness" between our model $q_\theta(\mathbf{s})$ and the physical distribution $P_B(\mathbf{s})$ is measured by the **Kullback-Leibler (KL) Divergence**. Minimizing the KL divergence is mathematically equivalent to minimizing the **Variational Free Energy** $F_q$:

$$F_q = \langle E \rangle_{q_\theta} - T \cdot S[q_\theta]$$

* **$\langle E \rangle_{q_\theta}$**: The expected energy under our model.
* **$S[q_\theta]$**: The Shannon entropy of our model, which encourages exploration.
* **Code Example**: In `vca_solver`, the variables `meanFreeEnergy` and `varFreeEnergy` track this quantity as the model trains.

## 2.4 Variational Annealing Strategy
In the provided code, we perform **Variational Annealing**:
1.  Start at a high temperature $T_0$ where the free energy is easy to minimize.
2.  Gradually decrease $T$ (and the transverse field $B_x$) according to an annealing schedule.
3.  At each step, update the RNN parameters $\theta$ to track the shifting Boltzmann distribution.
4.  By the time $T \to 0$, $q_\theta(\mathbf{s})$ should ideally collapse onto the global energy minimum.

# Section 3: Variational Neural Annealing (VNA)

## 3.1 The Variational Policy: $q_\theta(\mathbf{s})$
In VNA, the probability distribution over spin configurations is represented by a Neural Network (the **Dilated RNN** in your code). 

### Autoregressive Property
The mathematical essence of using an RNN is its **autoregressive** nature. The joint probability of a configuration $\mathbf{s} = (s_1, s_2, \dots, s_N)$ is decomposed into a product of conditional probabilities:
$$q_\theta(\mathbf{s}) = \prod_{i=1}^N q_\theta(s_i | s_{<i})$$

* **Sampling**: The code implements this in the `sample` method. It generates $s_1$, then feeds $s_1$ back into the RNN to generate $s_2$, and so on.
* **Normalization**: Unlike traditional physics methods, this product is **guaranteed to be normalized** ($\sum q_\theta(\mathbf{s}) = 1$), which bypasses the need to calculate the intractable partition function $Z$.

### Dilated Structure
The model uses a **Dilated RNN** where connections skip certain steps (defined by `n - 2**i`). This allows the model to capture long-range correlations between spins that are far apart in the sequence but physically coupled in the Hamiltonian.

## 3.2 The Objective Function
The goal is to minimize the **Variational Free Energy**:
$$F_q(\theta) = \mathbb{E}_{\mathbf{s} \sim q_\theta} [E(\mathbf{s}) + T \ln q_\theta(\mathbf{s})]$$

### The Gradient Challenge
Since the spin configurations $\mathbf{s}$ are discrete (0 or 1), we cannot propagate gradients directly through the samples. To solve this, we use the **Policy Gradient (REINFORCE)** theorem from Reinforcement Learning.

The gradient of the Free Energy with respect to the network parameters $\theta$ is:
$$\nabla_\theta F_q \approx \mathbb{E}_{\mathbf{s} \sim q_\theta} \left[ \nabla_\theta \ln q_\theta(\mathbf{s}) \cdot \left( F_{loc}(\mathbf{s}) - \bar{F} \right) \right]$$

Where:
* **$F_{loc}(\mathbf{s}) = E(\mathbf{s}) + T \ln q_\theta(\mathbf{s})$** is the "local" free energy of a sample.
* **$\bar{F}$** is the average free energy of the batch, acting as a baseline to reduce variance.

### Code Implementation: The "Fake" Cost Function
In your `vca_solver`, this is implemented using a "stop gradient" trick to force TensorFlow to compute the correct policy gradient:

In [3]:
# From vca_solver
Floc = Eloc + T_placeholder * log_probs_forgrad
cost = tf.reduce_mean(tf.multiply(log_probs_forgrad, tf.stop_gradient(Floc))) \
       - tf.reduce_mean(log_probs_forgrad) * tf.reduce_mean(tf.stop_gradient(Floc))

NameError: name 'Eloc' is not defined

# Section 4: Algorithmic Implementation in Jupyter

## 4.1 Adapting from Command Line to Notebook
In a standard environment, you would run the solver via the command line:
`python vca.py ../../dataset/EA/EA_10x10/10x10_uniform_seed1.txt`

In this Notebook, we instantiate the `config` class directly with the path to your problem instance file. This allows us to inspect variables and visualize progress in real-time.

## 4.2 The Sampling and Evaluation Loop
The core of the implementation involves three main stages that repeat during the annealing process:

1. **Sampling**: The RNN generates a batch of spin configurations $\mathbf{s}$ using the `sample` method. 
2. **Energy Evaluation**:
    * **Diagonal Elements**: Calculated using `Fullyconnected_diagonal_matrixelements` to get the classical Ising energy.
    * **Local Energies**: The `Fullyconnected_localenergies` function computes the off-diagonal contributions from the transverse field $B_x$. This involves flipping spins and re-evaluating probabilities.
3. **Gradient Update**: The `optstep` is executed using the "Fake Cost" derived in Section 3, updating the RNN parameters to favor lower free energy.

## 4.3 Execution Cell
To run the solver in this Notebook, use the following code block. Ensure you have the problem instance file (e.g., `10x10_uniform_seed1.txt`) in your working directory.

In [4]:
# 1. Path to your problem instance
instance_path = "../../dataset/EA/EA_10x10/10x10_uniform_seed1.txt" 
seed = 0

# 2. Initialize configuration
vca_config = config(instance_path, seed)

# 3. Run the solver
# This will output the annealing progress, energy (E), and free energy (F)
mean_energies, min_energies = vca_solver(vca_config)

print(f"\nFinal Results:")
print(f"Minimum Energy Found: {min_energies}")
print(f"Mean Energy: {mean_energies}")

FileNotFoundError: [Errno 2] No such file or directory: '../../dataset/EA/EA_10x10/10x10_uniform_seed1.txt'