<a href="https://colab.research.google.com/github/dvoils/neural-network-experiments/blob/main/energy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

Hopfield proposed that certain computational properties useful to organisms and computers can emerge **collectively** from large assemblies of simple interacting units (neurons). Instead of requiring complex circuitry, emergent computation arises spontaneously, analogous to physical systems such as magnetic domains or fluid vortices. This paper presents a model that exhibits **content-addressable memory**, error correction, generalization, and categorization---all emerging from dynamics rather than programmed instruction.


## Content-Addressable Memory and Dynamics

Let a memory be represented as a point $\mathbf{X} \in \mathbb{R}^N$. In certain physical systems (e.g., Ising models), dynamics defined by gradient descent in energy space drive states toward stable attractors:

$$
\frac{d\mathbf{X}}{dt} = -\nabla E(\mathbf{X})
$$

This system acts as a **content-addressable memory** if every partial or noisy input state $\mathbf{X}' \approx \mathbf{X}_a$ flows toward a stable point $\mathbf{X}_a$. Hopfield demonstrates that such dynamics can recover full memories from fragments.

## The Hopfield Model

Each of the $N$ neurons is binary:

$V_i \in \{0, 1\} \quad \text{or equivalently} \quad s_i = 2V_i - 1 \in \{-1, +1\}$

Neurons update asynchronously using the rule:

$$
V_i \leftarrow
\begin{cases}
1 & \text{if } \sum_j T_{ij} V_j > U_i \\
0 & \text{otherwise}
\end{cases}
$$

where $T_{ij}$ is the synaptic strength, and $U_i$ is the threshold (often taken to be 0).


## Hebbian Learning Rule

To store a set of binary patterns $\{\mathbf{V}^s\}_{s=1}^n$, Hopfield applies the Hebbian learning rule:

$$
T_{ij} = \sum_{s=1}^{n} (2V_i^s - 1)(2V_j^s - 1), \quad T_{ii} = 0
$$

This rule causes each stored pattern $\mathbf{V}^s$ to become a local minimum (attractor) in energy space:

$$
E = -\frac{1}{2} \sum_{i \neq j} T_{ij} V_i V_j
$$

Updating any neuron causes the energy to decrease:

$$
\Delta E = -\Delta V_i \sum_j T_{ij} V_j
$$

Thus, asynchronous updates guarantee convergence to a stable state.

## Capacity and Error Correction

* For $N$ neurons, the network can stably store about $0.15N$ random patterns before retrieval degrades.
* Noise in the system is modeled as Gaussian, leading to error probability:

$$
P = \frac{1}{\sqrt{2\pi}} \int_x^\infty e^{-t^2/2} dt
$$

* Simulations confirm recall is accurate for low pattern count and degrades as $n$ approaches $0.15N$.

## Categorization and Familiarity

* **Generalization**: The system categorizes ambiguous inputs by converging to the closest memory.
* **Familiarity**: High activation rates during convergence can indicate whether a pattern is familiar.
* **Categorical recall**: Close patterns may collapse into a shared attractor (useful for pattern completion).

## Extensions and Properties

* **Clipped Weights**: Even if $T_{ij} \in \{-1, 0, +1\}$, performance only degrades slightly.
* **Asymmetry**: Even non-symmetric $T_{ij} \neq T_{ji}$ can yield metastable attractors.
* **Forgetting**: Saturating synaptic strength (e.g., $T_{ij} \in [-3, 3]$) introduces natural forgetting.
* **Sequence Recall**: Adding asymmetric terms can allow short sequences $V^1 \to V^2 \to V^3 \to \dots$.

## Biological Plausibility

* Real neurons exhibit firing rates that approximate binary thresholds.
* Hebbian learning ($\Delta T_{ij} \propto V_i V_j$) is biologically plausible.
* Delay and stochasticity are modeled via asynchronous updates.


## Conclusions

Hopfield demonstrates that associative memory and pattern completion can emerge as collective properties of simple neuron-like elements. These results suggest:

1. Neural computation does not require complex sequential logic.
2. Distributed systems can perform robust parallel computation.
3. The brain may exploit such physical dynamics for memory, recognition, and decision-making.
4. Hardware implementations (e.g., neuromorphic chips) could benefit from these ideas.

**Key Concepts**: attractor dynamics, energy minimization, Hebbian learning, content-addressable memory, neural computation, error correction, categorization.

**Citations**: J.J. Hopfield, PNAS, Vol. 79, pp. 2554–2558, 1982.



## 1.1 Representing Neural Memory States

In statistical models of associative memory, a network of $N$ binary neurons can store and retrieve discrete memory patterns by evolving toward specific stable configurations. Each neuron $i \in \{1, 2, \ldots, N\}$ assumes a value $S_i \in \{-1, +1\}$, representing either a quiescent or active state, respectively. The complete state of the system at a given moment is defined by the collection of these binary variables, which form a configuration vector in $N$-dimensional binary space.

To formally define the storage of memory patterns, we adopt the notation:

$$
| \alpha, t \rangle = | S_1^\alpha S_2^\alpha, \ldots, S_N^\alpha \rangle
\tag{1.1}
$$

Here, $\alpha \in \{1, 2, \ldots, p\}$ indexes the memory pattern, with $p$ denoting the total number of patterns stored in the network. The superscript $\alpha$ on each $S_i^\alpha$ identifies the value of neuron $i$ in the $\alpha$-th pattern. The ket notation $| \alpha, t \rangle$ is borrowed from quantum mechanics and serves to emphasize that the system configuration is treated as a discrete state vector in a high-dimensional space of neuron activations.

Each memory pattern $\alpha$ corresponds to a unique point in this configuration space, and is defined by a fixed binary string:

$$
\boldsymbol{S}^\alpha = (S_1^\alpha, S_2^\alpha, \ldots, S_N^\alpha)
$$

These pattern vectors are embedded into the network via synaptic interactions and serve as **attractors** in the system’s energy landscape. During retrieval, a noisy or partial input pattern evolves over time toward the nearest stored pattern—typically by minimizing an appropriately defined energy function.

Although the time index $t$ appears in $| \alpha, t \rangle$, in this context it does not imply temporal evolution of the pattern itself. Rather, it indicates that at time $t$, the system is in—or converging toward—the stored pattern $\alpha$. In subsequent sections of the model, this interpretation becomes central to the retrieval dynamics governed by spin-glass-inspired energy minimization.

Equation (1.1) encapsulates the foundational idea of attractor memory: each $\boldsymbol{S}^\alpha$ represents a learned configuration that the neural network is capable of recalling via its intrinsic dynamics. The collection of these stored states defines the computational repertoire of the network.



# Spin-Glass Models of Neural Networks

## Introduction

The intersection of statistical mechanics and neural computation has provided deep insights into the function of memory and learning in artificial neural networks. In their seminal 1985 paper, Amit, Gutfreund, and Sompolinsky introduced a framework in which neural networks are studied as disordered systems analogous to spin glasses. By mapping the Hopfield network to an Ising-like system with random interactions, they employed tools from statistical physics to rigorously analyze the network's memory capacity and retrieval dynamics.

## Binary Neurons and Spin Analogy

Consider a fully connected network of $N$ binary neurons. Each neuron $i$ is represented by a spin variable $S_i \in \{ -1, +1 \}$, corresponding to the inactive and active states, respectively. The state of the entire network at a given time is described by the vector $\mathbf{S} = (S_1, S_2, \dots, S_N)$.

The neurons interact pairwise via symmetric synaptic couplings $J_{ij}$, which are modeled analogously to the interactions in a spin system. The dynamics of the network are governed by the principle of energy minimization, where the energy (or cost) function is given by the Hamiltonian:

$$
H(\mathbf{S}) = -\sum_{i<j} J_{ij} S_i S_j.
$$

This energy function mirrors the Ising model used in spin glass physics, where disordered interactions can lead to multiple stable and metastable states.

## Hebbian Learning and Memory Storage

The central idea behind memory storage is to embed a set of patterns $\{ \boldsymbol{\xi}^\mu \}$, where $\mu = 1, 2, \dots, p$, into the synaptic weights. Each pattern $\boldsymbol{\xi}^\mu$ is an $N$-dimensional binary vector $\xi_i^\mu \in \{ -1, +1 \}$, representing the desired memory to be stored.

Using Hebbian learning, the synaptic weights are constructed as:

$$
J_{ij} = \frac{1}{N} \sum_{\mu=1}^{p} \xi_i^\mu \xi_j^\mu,
$$

with $J_{ii} = 0$ to prevent self-interactions. This rule encodes correlations between co-active neurons and ensures that the stored patterns correspond to energy minima of the system.

## Retrieval and Overlap

To retrieve a memory, the network starts in a noisy or partial version of one of the stored patterns and is allowed to evolve via asynchronous updates. The network state evolves to minimize the energy $H(\mathbf{S})$, ideally converging to a stored pattern.

To quantify retrieval, the **overlap** between the current network state $\mathbf{S}$ and a stored pattern $\boldsymbol{\xi}^\mu$ is defined as:

$$
m^\mu = \frac{1}{N} \sum_{i=1}^{N} S_i \xi_i^\mu.
$$

If $m^\mu \approx 1$, the network is perfectly aligned with pattern $\mu$; if $m^\mu \approx 0$, there is no correlation.

## Statistical Mechanics Analysis

The energy landscape of the network is highly non-trivial, especially as the number of stored patterns $p$ increases. When $p$ is small, the stored patterns are attractors of the dynamics. However, beyond a certain threshold, the interference between patterns creates spurious minima and reduces retrieval performance.

Using techniques from mean-field theory, the authors analyze the system in the thermodynamic limit $N \to \infty$. They derive self-consistent equations for the overlap $m$ and introduce the notion of **spin-glass order parameters** to describe the disordered phase.

The key result is that the **critical storage capacity** of the network is approximately:

$$
\alpha_c = \frac{p_c}{N} \approx 0.138,
$$

where $\alpha = p/N$ is the memory loading ratio. Below this threshold, stored patterns can be reliably retrieved. Above it, the network enters a **spin-glass phase**, characterized by numerous metastable states and poor memory fidelity.

## Phase Transitions and Noise

The authors extend their analysis to finite temperatures by considering thermal noise in the neuron update rule, analogous to Boltzmann machines. At nonzero temperature $T$, the probability of neuron $i$ adopting state $S_i$ is governed by:

$$
P(S_i) \propto \exp\left( \frac{S_i h_i}{T} \right),
$$

where $h_i = \sum_j J_{ij} S_j$ is the local field at neuron $i$.

As temperature increases, the network undergoes a phase transition:

* **Low temperature**: retrieval phase with large overlap $m^\mu$.
* **Intermediate temperature**: spin-glass phase with many local minima.
* **High temperature**: paramagnetic phase with no memory structure.

This behavior parallels thermodynamic phase diagrams in disordered systems.

## Conclusion

Amit, Gutfreund, and Sompolinsky’s framework demonstrated that recurrent neural networks can be rigorously studied using the theory of spin glasses. The Hopfield model, under Hebbian learning, exhibits a rich set of phenomena: attractor dynamics, memory interference, phase transitions, and complex energy landscapes. These insights provided a foundation for the statistical mechanics of learning and memory, influencing both neuroscience and machine learning.

$$
| \alpha, t \rangle = | S_1^\alpha S_2^\alpha, \ldots , S_N ⟩
$$