<a href="https://colab.research.google.com/github/dvoils/neural-network-experiments/blob/main/energy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

Hopfield proposed that certain computational properties useful to organisms and computers can emerge **collectively** from large assemblies of simple interacting units (neurons). Instead of requiring complex circuitry, emergent computation arises spontaneously, analogous to physical systems such as magnetic domains or fluid vortices. This paper presents a model that exhibits **content-addressable memory**, error correction, generalization, and categorization---all emerging from dynamics rather than programmed instruction.


## Content-Addressable Memory and Dynamics

Let a memory be represented as a point $\mathbf{X} \in \mathbb{R}^N$. In certain physical systems (e.g., Ising models), dynamics defined by gradient descent in energy space drive states toward stable attractors:

$$
\frac{d\mathbf{X}}{dt} = -\nabla E(\mathbf{X})
$$

This system acts as a **content-addressable memory** if every partial or noisy input state $\mathbf{X}' \approx \mathbf{X}_a$ flows toward a stable point $\mathbf{X}_a$. Hopfield demonstrates that such dynamics can recover full memories from fragments.

## The Hopfield Model

Each of the $N$ neurons is binary:

$V_i \in \{0, 1\} \quad \text{or equivalently} \quad s_i = 2V_i - 1 \in \{-1, +1\}$

Neurons update asynchronously using the rule:

$$
V_i \leftarrow
\begin{cases}
1 & \text{if } \sum_j T_{ij} V_j > U_i \\
0 & \text{otherwise}
\end{cases}
$$

where $T_{ij}$ is the synaptic strength, and $U_i$ is the threshold (often taken to be 0).


## Hebbian Learning Rule

To store a set of binary patterns $\{\mathbf{V}^s\}_{s=1}^n$, Hopfield applies the Hebbian learning rule:

$$
T_{ij} = \sum_{s=1}^{n} (2V_i^s - 1)(2V_j^s - 1), \quad T_{ii} = 0
$$

This rule causes each stored pattern $\mathbf{V}^s$ to become a local minimum (attractor) in energy space:

$$
E = -\frac{1}{2} \sum_{i \neq j} T_{ij} V_i V_j
$$

Updating any neuron causes the energy to decrease:

$$
\Delta E = -\Delta V_i \sum_j T_{ij} V_j
$$

Thus, asynchronous updates guarantee convergence to a stable state.

## Capacity and Error Correction

* For $N$ neurons, the network can stably store about $0.15N$ random patterns before retrieval degrades.
* Noise in the system is modeled as Gaussian, leading to error probability:

$$
P = \frac{1}{\sqrt{2\pi}} \int_x^\infty e^{-t^2/2} dt
$$

* Simulations confirm recall is accurate for low pattern count and degrades as $n$ approaches $0.15N$.

## Categorization and Familiarity

* **Generalization**: The system categorizes ambiguous inputs by converging to the closest memory.
* **Familiarity**: High activation rates during convergence can indicate whether a pattern is familiar.
* **Categorical recall**: Close patterns may collapse into a shared attractor (useful for pattern completion).

## Extensions and Properties

* **Clipped Weights**: Even if $T_{ij} \in \{-1, 0, +1\}$, performance only degrades slightly.
* **Asymmetry**: Even non-symmetric $T_{ij} \neq T_{ji}$ can yield metastable attractors.
* **Forgetting**: Saturating synaptic strength (e.g., $T_{ij} \in [-3, 3]$) introduces natural forgetting.
* **Sequence Recall**: Adding asymmetric terms can allow short sequences $V^1 \to V^2 \to V^3 \to \dots$.

## Biological Plausibility

* Real neurons exhibit firing rates that approximate binary thresholds.
* Hebbian learning ($\Delta T_{ij} \propto V_i V_j$) is biologically plausible.
* Delay and stochasticity are modeled via asynchronous updates.


## Conclusions

Hopfield demonstrates that associative memory and pattern completion can emerge as collective properties of simple neuron-like elements. These results suggest:

1. Neural computation does not require complex sequential logic.
2. Distributed systems can perform robust parallel computation.
3. The brain may exploit such physical dynamics for memory, recognition, and decision-making.
4. Hardware implementations (e.g., neuromorphic chips) could benefit from these ideas.

**Key Concepts**: attractor dynamics, energy minimization, Hebbian learning, content-addressable memory, neural computation, error correction, categorization.

**Citations**: J.J. Hopfield, PNAS, Vol. 79, pp. 2554–2558, 1982.



# 1. Models of Neural Networks

*A Statistical Mechanics Approach to Associative Memory*

## 1.1 Binary-State Neurons and Spin Variables

In their foundational work, Amit, Gutfreund, and Sompolinsky propose a class of neural network models that draw a powerful analogy between networks of binary neurons and disordered spin systems from statistical mechanics. This framework incorporates and extends the **Hopfield model** and a related earlier model by **Little**, both of which treat neurons as binary units governed by deterministic or stochastic update rules \[A584–A586].

Each neuron is modeled as a two-state element, akin to an **Ising spin**, assuming a value $S_i \in \{-1, +1\}$, where:

* $+1$: the neuron is active, or has fired an electrochemical signal within a brief integration interval (typically on the order of a millisecond),
* $-1$: the neuron is inactive, corresponding to the absence of firing \[A588].

The full configuration of the network at a given moment is expressed as a vector in binary phase space. For a stored memory pattern labeled by $\alpha$, the instantaneous state of the system is denoted:

$$
| \alpha, t \rangle = | S_1^\alpha, S_2^\alpha, \ldots, S_N^\alpha; t \rangle
\tag{1.1}
$$

This state vector captures the activation states of all $N$ neurons at time $t$, forming a configuration in a $2^N$-dimensional space of possible network states \[A589–A590]. Each stored memory corresponds to a specific attractor configuration in this space.

## 1.2 Synaptic Connectivity and Postsynaptic Potentials

Neurons interact via **synaptic junctions**, whose strengths $J_{ij}$ represent the influence that neuron $j$ exerts on neuron $i$. These couplings may be:

* **Positive** ($J_{ij} > 0$): excitatory, increasing the likelihood that neuron $i$ will fire;
* **Negative** ($J_{ij} < 0$): inhibitory, suppressing neuron $i$'s activation.

During each integration period, the **postsynaptic potential** $V_i$ received by neuron $i$ is given by:

$$
V_i = \sum_j J_{ij} (S_j + 1)
\tag{1.2}
$$

This formulation uses a shifted spin variable $S_j + 1$, which maps the binary alphabet $\{-1, +1\}$ to $\{0, 2\}$. As a result, **only active neurons** contribute positively to the sum: neurons in state $S_j = +1$ contribute $2J_{ij}$, while silent neurons $S_j = -1$ contribute nothing. This emphasizes biological realism: only firing neurons induce postsynaptic potentials \[A591–A594].

## 1.3 Thresholding and Neuronal Stability

The decision of whether a neuron flips its state is governed by comparing its synaptic input $V_i$ to a **threshold** $U_i$. The **molecular field** (or net driving input) to neuron $i$ is then defined as $h_i = V_i - U_i$.

To ensure convergence toward stable configurations, the authors define a **stability condition** for each neuron:

$$
S_i h_i = S_i (V_i - U_i) > 0
\tag{1.3}
$$

This condition ensures that the spin $S_i$ is aligned with its molecular field $h_i$ \[A594–A596]. If this product is negative, the neuron is misaligned and energetically unstable, and will flip in the next update. Thus, neurons evolve by **flipping to align with their inputs**, a rule that guarantees that the system will descend toward a local minimum of the energy function.

## 1.4 The Hamiltonian and Energy Descent**

The entire network behaves like a thermodynamic system with a scalar energy function $H$, known as the **Hamiltonian**, which governs its evolution. This energy function is defined as:

$$
H = -\sum_i h_i S_i = -\frac{1}{2} \sum_{i,j} J_{ij} S_i S_j
\tag{1.4}
$$

In the first form, the energy is represented as the sum over the alignment of each neuron with its local molecular field. In the second form, under the assumption of **symmetric couplings** ($J_{ij} = J_{ji}$), the energy is expressed in terms of all **pairwise spin interactions**, with a factor of $\frac{1}{2}$ to avoid double-counting \[A596–A598].

When threshold potentials are set to $U_i = \sum_j J_{ij}$, the molecular field becomes purely interaction-driven, and external bias terms are absorbed into the interaction structure. The network then operates like a **zero-field spin glass**.

The significance of Equation (1.4) lies in its **gradient-descent property**: under the deterministic update rule implied by Equation (1.3), each neuron flip reduces $H$, ensuring convergence to a **local energy minimum**. These minima represent the **memory attractors** of the system.

## 1.5 Memory Storage via Hebbian Learning

To embed memories in the network, the authors define a **learning rule** that determines the synaptic couplings $J_{ij}$ based on a set of stored patterns $\{\boldsymbol{\xi}^\mu\}_{\mu=1}^p$. Each pattern $\xi_i^\mu \in \{-1, +1\}$ represents the desired state of neuron $i$ in memory $\mu$, and is treated as a **quenched random variable**, drawn independently with equal probability \[A601–A602].

The synaptic matrix is then constructed as:

$$
J_{ij} = \frac{1}{N} \sum_{\mu=1}^p \xi_i^\mu \xi_j^\mu, \quad i \ne j
\tag{1.5}
$$

This is the classic **Hebbian rule**, reflecting the principle that "neurons that fire together, wire together." The normalization by $N$ ensures proper scaling in the thermodynamic limit. This structure guarantees that each stored pattern $\boldsymbol{\xi}^\mu$ becomes a **fixed point** (attractor) of the dynamics — that is, a stable configuration under the update rule \[A600–A604].

## 1.6 Summary and Conceptual Foundations

In summary, Section A of the paper establishes a **mathematical and physiological framework** for modeling memory in neural networks. Neurons are modeled as binary spins, their interactions as symmetric synaptic couplings, and their dynamics as energy descent in a spin-glass-like system. Stable memories are represented as local minima in a high-dimensional energy landscape, and learning corresponds to shaping this landscape by embedding attractor states.

This formalism lays the foundation for analyzing storage capacity, stability under noise, and retrieval dynamics — themes that will be explored in subsequent sections of the paper using both **deterministic** and **stochastic** approaches.



$$
$$