<a href="https://colab.research.google.com/github/dvoils/neural-network-experiments/blob/main/energy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

Hopfield proposed that certain computational properties useful to organisms and computers can emerge **collectively** from large assemblies of simple interacting units (neurons). Instead of requiring complex circuitry, emergent computation arises spontaneously, analogous to physical systems such as magnetic domains or fluid vortices. This paper presents a model that exhibits **content-addressable memory**, error correction, generalization, and categorization---all emerging from dynamics rather than programmed instruction.


## Content-Addressable Memory and Dynamics

Let a memory be represented as a point $\mathbf{X} \in \mathbb{R}^N$. In certain physical systems (e.g., Ising models), dynamics defined by gradient descent in energy space drive states toward stable attractors:

$$
\frac{d\mathbf{X}}{dt} = -\nabla E(\mathbf{X})
$$

This system acts as a **content-addressable memory** if every partial or noisy input state $\mathbf{X}' \approx \mathbf{X}_a$ flows toward a stable point $\mathbf{X}_a$. Hopfield demonstrates that such dynamics can recover full memories from fragments.

## The Hopfield Model

Each of the $N$ neurons is binary:

$V_i \in \{0, 1\} \quad \text{or equivalently} \quad s_i = 2V_i - 1 \in \{-1, +1\}$

Neurons update asynchronously using the rule:

$$
V_i \leftarrow
\begin{cases}
1 & \text{if } \sum_j T_{ij} V_j > U_i \\
0 & \text{otherwise}
\end{cases}
$$

where $T_{ij}$ is the synaptic strength, and $U_i$ is the threshold (often taken to be 0).


## Hebbian Learning Rule

To store a set of binary patterns $\{\mathbf{V}^s\}_{s=1}^n$, Hopfield applies the Hebbian learning rule:

$$
T_{ij} = \sum_{s=1}^{n} (2V_i^s - 1)(2V_j^s - 1), \quad T_{ii} = 0
$$

This rule causes each stored pattern $\mathbf{V}^s$ to become a local minimum (attractor) in energy space:

$$
E = -\frac{1}{2} \sum_{i \neq j} T_{ij} V_i V_j
$$

Updating any neuron causes the energy to decrease:

$$
\Delta E = -\Delta V_i \sum_j T_{ij} V_j
$$

Thus, asynchronous updates guarantee convergence to a stable state.

## Capacity and Error Correction

* For $N$ neurons, the network can stably store about $0.15N$ random patterns before retrieval degrades.
* Noise in the system is modeled as Gaussian, leading to error probability:

$$
P = \frac{1}{\sqrt{2\pi}} \int_x^\infty e^{-t^2/2} dt
$$

* Simulations confirm recall is accurate for low pattern count and degrades as $n$ approaches $0.15N$.

## Categorization and Familiarity

* **Generalization**: The system categorizes ambiguous inputs by converging to the closest memory.
* **Familiarity**: High activation rates during convergence can indicate whether a pattern is familiar.
* **Categorical recall**: Close patterns may collapse into a shared attractor (useful for pattern completion).

## Extensions and Properties

* **Clipped Weights**: Even if $T_{ij} \in \{-1, 0, +1\}$, performance only degrades slightly.
* **Asymmetry**: Even non-symmetric $T_{ij} \neq T_{ji}$ can yield metastable attractors.
* **Forgetting**: Saturating synaptic strength (e.g., $T_{ij} \in [-3, 3]$) introduces natural forgetting.
* **Sequence Recall**: Adding asymmetric terms can allow short sequences $V^1 \to V^2 \to V^3 \to \dots$.

## Biological Plausibility

* Real neurons exhibit firing rates that approximate binary thresholds.
* Hebbian learning ($\Delta T_{ij} \propto V_i V_j$) is biologically plausible.
* Delay and stochasticity are modeled via asynchronous updates.


## Conclusions

Hopfield demonstrates that associative memory and pattern completion can emerge as collective properties of simple neuron-like elements. These results suggest:

1. Neural computation does not require complex sequential logic.
2. Distributed systems can perform robust parallel computation.
3. The brain may exploit such physical dynamics for memory, recognition, and decision-making.
4. Hardware implementations (e.g., neuromorphic chips) could benefit from these ideas.

**Key Concepts**: attractor dynamics, energy minimization, Hebbian learning, content-addressable memory, neural computation, error correction, categorization.

**Citations**: J.J. Hopfield, PNAS, Vol. 79, pp. 2554–2558, 1982.



## 1.1 Representing Neural Memory States

In statistical models of associative memory, a network of $N$ binary neurons can store and retrieve discrete memory patterns by evolving toward specific stable configurations. Each neuron $i \in \{1, 2, \ldots, N\}$ assumes a value $S_i \in \{-1, +1\}$, representing either a quiescent or active state, respectively. The complete state of the system at a given moment is defined by the collection of these binary variables, which form a configuration vector in $N$-dimensional binary space.

To formally define the storage of memory patterns, we adopt the notation:

$$
| \alpha, t \rangle = | S_1^\alpha S_2^\alpha, \ldots, S_N^\alpha \rangle
\tag{1.1}
$$

Here, $\alpha \in \{1, 2, \ldots, p\}$ indexes the memory pattern, with $p$ denoting the total number of patterns stored in the network. The superscript $\alpha$ on each $S_i^\alpha$ identifies the value of neuron $i$ in the $\alpha$-th pattern. The ket notation $| \alpha, t \rangle$ is borrowed from quantum mechanics and serves to emphasize that the system configuration is treated as a discrete state vector in a high-dimensional space of neuron activations.

Each memory pattern $\alpha$ corresponds to a unique point in this configuration space, and is defined by a fixed binary string:

$$
\boldsymbol{S}^\alpha = (S_1^\alpha, S_2^\alpha, \ldots, S_N^\alpha)
$$

These pattern vectors are embedded into the network via synaptic interactions and serve as **attractors** in the system’s energy landscape. During retrieval, a noisy or partial input pattern evolves over time toward the nearest stored pattern—typically by minimizing an appropriately defined energy function.

Although the time index $t$ appears in $| \alpha, t \rangle$, in this context it does not imply temporal evolution of the pattern itself. Rather, it indicates that at time $t$, the system is in—or converging toward—the stored pattern $\alpha$. In subsequent sections of the model, this interpretation becomes central to the retrieval dynamics governed by spin-glass-inspired energy minimization.

Equation (1.1) encapsulates the foundational idea of attractor memory: each $\boldsymbol{S}^\alpha$ represents a learned configuration that the neural network is capable of recalling via its intrinsic dynamics. The collection of these stored states defines the computational repertoire of the network.


# 1.2 Synaptic Input (Local Potential)

To determine whether a neuron will flip its state, the model assigns each neuron a **local synaptic potential** $V_i$.  In the notation of *Amit, Gutfreund & Sompolinsky* this potential is defined by

$$
V_i \;=\; \sum_{j} J_{i,j}\,\bigl(S_j + 1\bigr)
\tag{1.2}
$$

Where,

* **$V_i$** – the total input (sometimes called the *local field* or *membrane potential*) experienced by neuron $i$.
* **$J_{i,j}$** – the fixed synaptic coupling from neuron $j$ to neuron $i$.  These couplings were set during learning (later formalised by the Hebbian rule in Eq. 1.5).
* **$S_j\in\{-1,+1\}$** – the present state of neuron $j$.  A value $+1$ stands for “active/firing,” while $-1$ stands for “inactive/silent.”
* **Shift term $(S_j+1)$** – by adding $1$ the authors map the binary alphabet $\{-1,+1\}$ to $\{0,2\}$.

  * If $S_j = -1$ (silent) then $S_j+1 = 0$: neuron $j$ contributes **nothing** to $V_i$.
  * If $S_j = +1$ (active) then $S_j+1 = 2$: neuron $j$ contributes **$2J_{i,j}$** to $V_i$.

This “rectified” form ensures that only active presynaptic neurons inject current; silent neurons are effectively ignored.  The factor 2 can later be absorbed into the choice of threshold.

## Neural‐network interpretation

Equation (1.2) states that neuron $i$ integrates weighted inputs from every other neuron that is currently firing.  Once the potential $V_i$ is computed, a deterministic update rule (introduced immediately after Eq. 1.2 in the paper) flips the neuron according to whether $V_i$ exceeds a threshold $U_i$:

$$
S_i(t+1) =
\begin{cases}
+1, & V_i(t) \;>\; U_i,\\[4pt]
-1, & V_i(t) \;\le\; U_i.
\end{cases}
$$

When all thresholds are set to $U_i=0$ and the constant factor 2 is absorbed into $J_{i,j}$, this reduces to the familiar “sign” rule often written $S_i(t+1)=\operatorname{sgn}\bigl(\sum_j J_{i,j}S_j(t)\bigr)$.

## Statistical‐physics perspective

From the spin‑glass viewpoint, the shift $(S_j+1)$ is a convenience: it isolates the **interaction** between spins (captured by $J_{i,j}S_j$) from a constant “background field.”  Whether one works with $\{0,2\}$ or $\{-1,+1\}$ is immaterial to the thermodynamics as long as the Hamiltonian is defined consistently.  What matters is that the nervous system’s dynamics still amount to **energy descent** toward stored attractors.

## Key takeaway

Equation (1.2) formalises how each neuron translates the collective activity of the network into a single scalar $V_i$.  By adding the offset $+1$, the authors guarantee that **only active presynaptic neurons contribute positive current**, simplifying later analytical work without changing the essential physics of attractor formation.



$$
$$

# 1.3 Neuron Update Condition: Threshold-Based Dynamics

In the spin-glass neural network framework, each neuron evaluates whether to flip its state by comparing its **local input** to a **threshold**. This comparison is captured in Equation (1.3):

$$
S_i h_i = S_i (V_i - U_i) > 0
\tag{1.3}
$$

Where,

* $S_i \in \{-1, +1\}$: the current state of neuron $i$.
* $V_i$: the **local synaptic input** or potential to neuron $i$, defined in Equation (1.2) as:

  $$
  V_i = \sum_j J_{i,j}(S_j + 1)
  $$
* $U_i$: the **threshold** for neuron $i$, determining the input level required for it to become active.
* $h_i = V_i - U_i$: the **net driving input** to neuron $i$.

Equation (1.3) expresses a condition for *stability* of the neuron’s current state:

> A neuron is said to be stable if its state $S_i$ is **aligned** with its net input $h_i$.

This is enforced by checking whether the product:

$$
S_i h_i = S_i (V_i - U_i)
$$

is **positive**. If it is, then the neuron is **already in the correct state** (aligned with its input). If it’s negative, the neuron is **misaligned** and will flip in the next update.

## Neural Dynamics

The rule is **deterministic**: each neuron flips only if doing so **reduces the system’s energy** (as we'll see in Eq. 1.4). So, the update proceeds as:

* If $S_i h_i > 0$: neuron $i$ is **stable**, and no change occurs.
* If $S_i h_i < 0$: neuron $i$ is **unstable**, and will flip in the next iteration:

$$
S_i \rightarrow -S_i
$$

This is analogous to **gradient descent** on the energy landscape: neurons flip only when doing so lowers the system’s energy.

## Physical Analogy

From the spin-glass point of view, this condition reflects whether the **spin $S_i$** is aligned with the **local effective magnetic field** $h_i$. Stability corresponds to spins minimizing the energy contribution of their local field.


# 1.3 Energy Function of the Network: The Hamiltonian

The evolution of the neural network can be understood as a process of **energy minimization**. Stable states (memories) correspond to local minima of an energy landscape. This energy is formalized by the **Hamiltonian** $H$, given in Equation (1.4):

$$
H = -\sum_i h_i S_i = -\frac{1}{2} \sum_{i,j} J_{ij} S_i S_j
\tag{1.4}
$$

Where,

* $H$: The **Hamiltonian**, or total energy, of the neural network.
* $S_i \in \{-1, +1\}$: The state of neuron (spin) $i$.
* $h_i$: The **molecular field** (or local effective field) acting on neuron $i$, defined by:

$$
h_i = \sum_j J_{ij} S_j
$$

  — which corresponds to $V_i - U_i$ under the assumption $U_i = 0$.
* $J_{ij}$: The **synaptic coupling** from neuron $j$ to neuron $i$, assumed symmetric: $J_{ij} = J_{ji}$.

## Interpretation

This Hamiltonian encodes the system’s tendency to evolve toward configurations where the spins are **aligned with their molecular fields**. The negative sign ensures that the system minimizes energy when spins and fields are aligned:

$$
S_i = \text{sign}(h_i)
$$

In the first form,

$$
H = -\sum_i h_i S_i
$$

we interpret the energy as a sum over all neurons, each contributing based on the alignment between its state $S_i$ and the local field $h_i$.

In the second, equivalent form,

$$
H = -\frac{1}{2} \sum_{i,j} J_{ij} S_i S_j
$$

the Hamiltonian is expressed directly in terms of pairwise spin interactions. The factor of $\frac{1}{2}$ corrects for double-counting each symmetric pair $(i,j)$.

## Dynamical Implication

Because the update rule (Eq. 1.3) flips a spin only if it **decreases the energy**, the entire system evolves by **sequential energy descent**:

* Each individual spin flip that satisfies $S_i h_i < 0$ leads to $\Delta H < 0$.
* The dynamics are **dissipative** — the system moves toward **local minima** of $H$, which are the **stable states or attractors**.

## Physics Analogy

In a spin glass, spins interact via both ferromagnetic and antiferromagnetic couplings, leading to **frustration** — where not all pairwise interactions can be simultaneously satisfied. This creates a **rugged energy landscape** with many local minima. In the neural model, these minima correspond to **memorized patterns**.

This equation formally aligns the neural network with the **Sherrington-Kirkpatrick (SK) model** of spin glasses, except that here, the couplings $J_{ij}$ are **structured** — designed through learning — rather than randomly drawn.

