<a href="https://colab.research.google.com/github/dvoils/neural-network-experiments/blob/main/energy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

Hopfield proposed that certain computational properties useful to organisms and computers can emerge **collectively** from large assemblies of simple interacting units (neurons). Instead of requiring complex circuitry, emergent computation arises spontaneously, analogous to physical systems such as magnetic domains or fluid vortices. This paper presents a model that exhibits **content-addressable memory**, error correction, generalization, and categorization---all emerging from dynamics rather than programmed instruction.


## Content-Addressable Memory and Dynamics

Let a memory be represented as a point $\mathbf{X} \in \mathbb{R}^N$. In certain physical systems (e.g., Ising models), dynamics defined by gradient descent in energy space drive states toward stable attractors:

$$
\frac{d\mathbf{X}}{dt} = -\nabla E(\mathbf{X})
$$

This system acts as a **content-addressable memory** if every partial or noisy input state $\mathbf{X}' \approx \mathbf{X}_a$ flows toward a stable point $\mathbf{X}_a$. Hopfield demonstrates that such dynamics can recover full memories from fragments.

## The Hopfield Model

Each of the $N$ neurons is binary:

$V_i \in \{0, 1\} \quad \text{or equivalently} \quad s_i = 2V_i - 1 \in \{-1, +1\}$

Neurons update asynchronously using the rule:

$$
V_i \leftarrow
\begin{cases}
1 & \text{if } \sum_j T_{ij} V_j > U_i \\
0 & \text{otherwise}
\end{cases}
$$

where $T_{ij}$ is the synaptic strength, and $U_i$ is the threshold (often taken to be 0).


## Hebbian Learning Rule

To store a set of binary patterns $\{\mathbf{V}^s\}_{s=1}^n$, Hopfield applies the Hebbian learning rule:

$$
T_{ij} = \sum_{s=1}^{n} (2V_i^s - 1)(2V_j^s - 1), \quad T_{ii} = 0
$$

This rule causes each stored pattern $\mathbf{V}^s$ to become a local minimum (attractor) in energy space:

$$
E = -\frac{1}{2} \sum_{i \neq j} T_{ij} V_i V_j
$$

Updating any neuron causes the energy to decrease:

$$
\Delta E = -\Delta V_i \sum_j T_{ij} V_j
$$

Thus, asynchronous updates guarantee convergence to a stable state.

## Capacity and Error Correction

* For $N$ neurons, the network can stably store about $0.15N$ random patterns before retrieval degrades.
* Noise in the system is modeled as Gaussian, leading to error probability:

$$
P = \frac{1}{\sqrt{2\pi}} \int_x^\infty e^{-t^2/2} dt
$$

* Simulations confirm recall is accurate for low pattern count and degrades as $n$ approaches $0.15N$.

## Categorization and Familiarity

* **Generalization**: The system categorizes ambiguous inputs by converging to the closest memory.
* **Familiarity**: High activation rates during convergence can indicate whether a pattern is familiar.
* **Categorical recall**: Close patterns may collapse into a shared attractor (useful for pattern completion).

## Extensions and Properties

* **Clipped Weights**: Even if $T_{ij} \in \{-1, 0, +1\}$, performance only degrades slightly.
* **Asymmetry**: Even non-symmetric $T_{ij} \neq T_{ji}$ can yield metastable attractors.
* **Forgetting**: Saturating synaptic strength (e.g., $T_{ij} \in [-3, 3]$) introduces natural forgetting.
* **Sequence Recall**: Adding asymmetric terms can allow short sequences $V^1 \to V^2 \to V^3 \to \dots$.

## Biological Plausibility

* Real neurons exhibit firing rates that approximate binary thresholds.
* Hebbian learning ($\Delta T_{ij} \propto V_i V_j$) is biologically plausible.
* Delay and stochasticity are modeled via asynchronous updates.


## Conclusions

Hopfield demonstrates that associative memory and pattern completion can emerge as collective properties of simple neuron-like elements. These results suggest:

1. Neural computation does not require complex sequential logic.
2. Distributed systems can perform robust parallel computation.
3. The brain may exploit such physical dynamics for memory, recognition, and decision-making.
4. Hardware implementations (e.g., neuromorphic chips) could benefit from these ideas.

**Key Concepts**: attractor dynamics, energy minimization, Hebbian learning, content-addressable memory, neural computation, error correction, categorization.

**Citations**: J.J. Hopfield, PNAS, Vol. 79, pp. 2554–2558, 1982.


# 1. Models of Neural Networks

*A Statistical Mechanics Approach to Associative Memory*

## 1.1 Binary-State Neurons and Spin Variables

In this section, Amit, Gutfreund, and Sompolinsky present a powerful analogy between neural networks and disordered spin systems in statistical mechanics. This framework formalizes the behavior of networks such as the Hopfield model and the earlier Little model by treating each neuron as a binary unit. Neurons are modeled as Ising-like spins that can be in one of two discrete states: $S_i = +1$ indicates that a neuron is firing (active), and $S_i = -1$ indicates that it is not firing (inactive). These binary states correspond to the presence or absence of electrochemical activity over short time intervals, typically on the order of milliseconds.

The state of the entire network at a given time $t$, associated with a particular memory pattern labeled by $\alpha$, is represented by the vector:

$$
| \alpha, t \rangle = | S_1^\alpha, S_2^\alpha, \ldots, S_N^\alpha; t \rangle
\tag{1.1}
$$

This vector captures the configuration of all $N$ neurons, where each $S_i^\alpha$ is the state of the $i$-th neuron in memory pattern $\alpha$. The full network state exists in a $2^N$-dimensional binary phase space, with each configuration representing a potential microstate of the system. These memory states form the foundational units of computation and recall in the model.

## 1.2 Synaptic Connectivity and Postsynaptic Potentials

Neurons are coupled through synaptic junctions with connection strengths $J_{ij}$, which determine the influence of neuron $j$ on neuron $i$. These interactions may be excitatory (positive $J_{ij}$) or inhibitory (negative $J_{ij}$). During a short integration period, each neuron aggregates input from other neurons. The total **postsynaptic potential** $V_i$ received by neuron $i$ is defined as:

$$
V_i = \sum_j J_{ij} (S_j + 1)
\tag{1.2}
$$

This formulation maps the spin values $S_j \in \{-1, +1\}$ into the domain $\{0, 2\}$, ensuring that only active neurons contribute to the synaptic sum. Specifically, an inactive presynaptic neuron ($S_j = -1$) contributes nothing to $V_i$, while an active one ($S_j = +1$) contributes $2J_{ij}$. This encoding preserves biological plausibility, as silent neurons do not influence postsynaptic potentials.

## 1.3 Thresholding and Neuronal Stability

Once the total potential $V_i$ is computed, it is compared to a threshold value $U_i$. The difference $h_i = V_i - U_i$ is known as the **molecular field** or local input field. A neuron is said to be stable when its state $S_i$ is aligned with its molecular field. This condition is expressed as:

$$
S_i h_i = S_i (V_i - U_i) > 0
\tag{1.3}
$$

If this inequality is satisfied, the neuron is energetically stable and will not change state. If the inequality is violated, the neuron is misaligned with its input field and will flip its state in the next update. This condition ensures that the network only evolves through **state transitions that reduce the system's energy**, pushing the system toward more stable configurations.

## 1.4 The Hamiltonian and Energy Descent

The global behavior of the network is governed by a scalar energy function, known as the **Hamiltonian**. It measures the total energy of the system in a given configuration. The Hamiltonian is defined as:

$$
H = -\sum_i h_i S_i = -\frac{1}{2} \sum_{i,j} J_{ij} S_i S_j
\tag{1.4}
$$

The first form of the Hamiltonian expresses the alignment between each neuron and its local input field. The second form arises under the assumption that the synaptic weights are symmetric, $J_{ij} = J_{ji}$, and avoids double-counting pairwise interactions with a factor of $\frac{1}{2}$. When threshold potentials are chosen such that $U_i = \sum_j J_{ij}$, any constant external field contributions are eliminated, and the energy is entirely determined by inter-neuronal interactions.

This energy function serves as a Lyapunov function for the system: the dynamics defined by the update rule in Equation (1.3) guarantee that each neuron flip reduces or leaves unchanged the value of $H$. Therefore, the system follows a path of **monotonic energy descent** and converges to a local minimum of the Hamiltonian. These minima correspond to **stable memory configurations**.

## 1.5 Memory Storage via Hebbian Learning

The storage of memory patterns in the network is achieved by properly selecting the synaptic weights $J_{ij}$. The authors adopt a **Hebbian learning rule**, inspired by the principle that co-activated neurons should strengthen their mutual connection. Given a set of $p$ memory patterns $\{ \boldsymbol{\xi}^\mu \}_{\mu=1}^p$, the synaptic weights are defined as:

$$
J_{ij} = \frac{1}{N} \sum_{\mu=1}^p \xi_i^\mu \xi_j^\mu, \quad i \ne j
\tag{1.5}
$$

Each pattern $\xi^\mu \in \{-1, +1\}^N$ represents a full binary configuration of the network and is assumed to be randomly sampled. The couplings $J_{ij}$ encode correlations between neurons across all stored patterns. The diagonal elements $J_{ii}$ are set to zero, reflecting the absence of self-coupling. This formulation ensures that each pattern $\boldsymbol{\xi}^\mu$ becomes a **stable fixed point** of the network dynamics — that is, a local minimum of the Hamiltonian to which the system can converge during recall.

## 1.6 Summary

This is the theoretical and physiological foundation for modeling neural networks as spin-glass systems. Each neuron is represented as a binary spin, and its interaction with other neurons is mediated by synaptic couplings that are symmetric and determined through Hebbian learning. The dynamics of the system are governed by an energy function, which ensures convergence toward stable configurations that represent memorized patterns. In this way, the network performs associative memory by evolving toward attractors in a rugged energy landscape — a concept borrowed directly from disordered systems in statistical mechanics.


$$
$$