# Master Comparison Table Across Generative Model Families  


| Aspect | Hopfield Network | Boltzmann Machine (BM) | Restricted Boltzmann Machine (RBM) | Deep Belief Network (DBN) | Diffusion Models (DDPM, Score Models) |
|-------|------------------|-------------------------|-------------------------------------|-----------------------------|-----------------------------------------|
| **Year / Pioneers** | 1982 – Hopfield | 1983–85 – Hinton, Ackley, Sejnowski | 1986 – Smolensky, Hinton | 2006 – Hinton, Osindero, Teh | 2019–2021 – Sohl-Dickstein → Song → Ho |
| **Core Purpose** | Memory retrieval (associative memory) | Full probabilistic generative model | Efficient generative model; feature learning | Deep hierarchical generative model | High-quality generative modeling (images, audio, video) |
| **State Type** | Deterministic binary neurons (+1/−1) | Stochastic binary neurons | Stochastic binary hidden units | Stacked stochastic layers | Continuous real-valued signals (Gaussian noise) |
| **Energy Function** | Quadratic symmetric energy | Full Boltzmann energy with hidden units | Bipartite energy: $$E(v,h)$$ | Sum of RBM layer energies | Continuous energy gradient (score-based) |
| **Stochasticity Level** | None (deterministic) | Fully stochastic Gibbs sampling | Stochastic but simplified | Layer-wise stochastic sampling | Continuous stochastic differential equations (SDEs) |
| **Temperature Control** | Not used | Yes (controls randomness) | Typically fixed at \(T=1\) | Implicit via RBMs | Central idea via noise schedule \(\beta_t\) |
| **Input–Output Relation** | Attractor memory; converges to stored pattern | Joint distribution \(P(v,h)\) | Restricted joint distribution | Hierarchical generative mapping | No direct mapping; diffusion trajectory |
| **Inference Mechanism** | Energy minimization | Long MCMC chains | Single-step conditional sampling | Downward sampling using trained RBMs | Reverse SDE or denoising ODE |
| **Training Difficulty** | Very easy (Hebbian learning) | Extremely difficult (intractable partition function \(Z\)) | Moderate (Contrastive Divergence) | Hard but solvable via greedy layer-wise CD | Requires large nets but stable and scalable |
| **Partition Function \(Z\)** | Closed form | Intractable | Approximate but easier | Inherited from RBMs | Not required (score-based models) |
| **Sampling Process** | Deterministic convergence | Gibbs sampling; slow | Efficient Gibbs sampling | Layer-wise top-down sampling | Start from noise → iterative denoising |
| **Memory vs. Generation** | Memory only | Generative | Generative + feature extraction | Generative hierarchical model | High-quality generative modeling |
| **Architecture** | Fully connected symmetric | Fully connected symmetric | Bipartite graph (no intra-layer edges) | Stack of RBMs | U-Net backbone + noise schedule |
| **Mathematical Philosophy** | Energy minimization | Gibbs distribution; statistical physics | Structured energy modeling | Deep latent generative process | Probabilistic diffusion in continuous space |
| **Convergence Target** | Energy minima | Low-energy samples | Data distribution | Hierarchical latent representation | Data manifold via denoising |
| **Best Known Uses** | Associative memory | Theoretical generative modeling | Pretraining and feature discovery | Pre-2012 deep learning foundation | Stable Diffusion, Imagen, DALLE-3 |
| **Strengths** | Simple and interpretable | Elegant probabilistic theory | Efficient learning, strong features | Enabled deep learning revival | State-of-the-art generative quality |
| **Weaknesses** | Limited capacity | Impossible to scale | Limited expressiveness | Outperformed by VAEs/Transformers | Heavy compute but improving |
| **Relation to Physics** | Static energy surface | Full Boltzmann law | Simplified Boltzmann form | Composed Boltzmann layers | Stochastic thermodynamics, SDEs |
| **Concept of Randomness** | None | Controlled via temperature | Conditional stochasticity | Layer sampling | Gaussian noise schedule |
| **Does it start from randomness?** | No | Yes | Yes | Yes | Yes — explicitly from pure Gaussian noise |
| **Noise Removal / Annealing** | Not applicable | Simulated annealing | CD negative phase | Greedy RBM annealing | Reverse diffusion = gradual noise removal |
| **Connection to Modern Diffusion Models** | Conceptually related (energy minima) | Directly related (Boltzmann energy) | Related via energy modeling | Historical foundation | The modern extension of stochastic generative modeling |



# Deeper Conceptual Comparison (Narrative)

## 1. Hopfield Network — Deterministic Memory  
A Hopfield network behaves like an energy landscape full of valleys.  
When you present a partial or noisy pattern, the system always slides **deterministically** into the nearest valley, retrieving the closest stored memory.  
There is **no randomness**; only energy minimization.

---

## 2. Boltzmann Machine — Randomness and Statistical Physics  
Imagine a marble rolling on a heated metal surface.  
At **high temperature**, the marble jumps everywhere (high randomness).  
As you **lower the temperature**, it settles into deeper and deeper valleys.  
This is a **true probabilistic generative model**, sampling from the Boltzmann distribution.  
Its mathematical foundation connects directly to modern diffusion and stochastic dynamics.

---

## 3. Restricted Boltzmann Machine (RBM) — Simplified Boltzmann  
The RBM removes lateral connections within layers, creating a **bipartite structure**.  
This single structural change makes sampling much faster and learning feasible.  
Hinton introduced **Contrastive Divergence** to train RBMs efficiently.  
This breakthrough reopened the path to deep learning around **2006**.

---

## 4. Deep Belief Networks — Layers of RBMs  
A DBN stacks RBMs on top of each other.  
Each RBM learns a distribution over the layer below it.  
Through this stacking, a **deep hierarchy of increasingly abstract representations** emerges.  
This was one of the earliest successes of deep learning before the rise of CNNs and Transformers.

---

## 5. Diffusion Models — Continuous Noise Removal  
A diffusion model begins with **pure Gaussian noise**.  
Then, it gradually removes noise using reverse diffusion or stochastic differential equations (SDEs).  
This process eventually moves toward a **high-probability data sample**.  
The mathematics connects directly to:  
- Boltzmann energy  
- Langevin dynamics  
- Stochastic thermodynamics  

Diffusion is the **modern rebirth** of stochastic generative modeling.

---

# Final Summary (One Sentence Each)

**Hopfield:** deterministic memory retrieval through energy minimization.  
**Boltzmann Machine:** a probabilistic neural network rooted in physics and randomness.  
**RBM:** a simplified Boltzmann model enabling efficient training through a bipartite structure.  
**DBN:** a deep generative hierarchy created by stacking RBMs.  
**Diffusion Models:** generate samples by transforming pure noise into structure through iterative denoising.


# Complete Mathematical Summary of Hopfield, Boltzmann, RBM, DBN, and Diffusion Models  


---

# 1. Hopfield Network — Principal Equations

### Energy Function
$$
E(s)= -\frac12 \sum_i \sum_j w_{ij} s_i s_j - \sum_i b_i s_i
$$

### Local Field
$$
h_i = \sum_j w_{ij} s_j + b_i
$$

### Deterministic Update Rule
$$
s_i \leftarrow \text{sign}(h_i)
$$

### Hebbian Learning (Store \(P\) patterns)
$$
w_{ij}=\frac{1}{P} \sum_{\mu=1}^{P} \xi_i^{\mu}\,\xi_j^{\mu}
$$

### Energy Decrease Condition
$$
\Delta E_i = -s_i h_i < 0
$$

---

# 2. Boltzmann Machine — Principal Equations

### Energy Function
$$
E(v,h)= -\sum_{i<j} w_{ij} s_i s_j - \sum_i b_i s_i
$$

### Boltzmann (Gibbs) Distribution
$$
P(s)=\frac{1}{Z} e^{-E(s)/T},
\qquad
Z=\sum_s e^{-E(s)/T}
$$

### Neuron Activation Probability
$$
P(s_i=1\mid \text{rest})=\sigma\!\left(\frac{1}{T}\sum_j w_{ij}s_j + b_i\right)
$$

Where  
$$
\sigma(x)=\frac{1}{1+e^{-x}}.
$$

### Learning Rule
$$
\Delta w_{ij}=\eta\left(\langle s_i s_j\rangle_{\text{data}}-\langle s_i s_j\rangle_{\text{model}}\right)
$$

### Gibbs Sampling Update
$$
s_i^{t+1}\sim \text{Bernoulli}\left(\sigma\!\left(\sum_j w_{ij}s_j^{t}+b_i\right)\right)
$$

---

# 3. Restricted Boltzmann Machine (RBM) — Principal Equations

### Energy Function
$$
E(v,h)= -a^{\top}v - b^{\top}h - v^{\top}Wh
$$

### Joint Distribution
$$
P(v,h)=\frac{1}{Z} e^{-E(v,h)}
$$

### Conditional Distributions  
(No intra-layer connections)

**Visible given hidden**
$$
P(v_i=1\mid h)=\sigma\left(a_i + \sum_j w_{ij}h_j\right)
$$

**Hidden given visible**
$$
P(h_j=1\mid v)=\sigma\left(b_j + \sum_i w_{ij}v_i\right)
$$

### Contrastive Divergence Learning Rule
$$
\Delta w_{ij}=\eta\left(\langle v_i h_j\rangle_{\text{data}}
-
\langle v_i h_j\rangle_{\text{model}}\right)
$$

### CD-k Chain
$$
v^{(0)} \rightarrow h^{(0)} \rightarrow v^{(1)} \rightarrow h^{(1)} \rightarrow \cdots \rightarrow v^{(k)}
$$

---

# 4. Deep Belief Networks (DBN) — Principal Equations

### Layer-wise RBM Energy (Layer \(l\))
$$
E^{(l)}(v^{(l)},h^{(l)})=
-a^{(l)\top}v^{(l)}-b^{(l)\top}h^{(l)}-v^{(l)\top}W^{(l)}h^{(l)}
$$

### Greedy Layer-wise Learning
$$
\Delta W^{(l)}=
\eta\left(
\langle v^{(l)}h^{(l)}\rangle_{\text{data}}
-
\langle v^{(l)}h^{(l)}\rangle_{\text{model}}
\right)
$$

### DBN Joint Distribution (Two-Layer Example)
$$
P(v,h^{(1)},h^{(2)})
=
P(h^{(1)},h^{(2)})
\prod_i P(v_i\mid h^{(1)})
$$

Top layer behaves like a Boltzmann Machine; lower layers like conditional RBMs.

### Downward Generative Process
$$
h^{(2)} \rightarrow h^{(1)} \rightarrow v
$$

### Upward Inference
$$
v \rightarrow h^{(1)} \rightarrow h^{(2)}
$$

---

# 5. Diffusion Models — Principal Equations

### Forward Diffusion (Add Noise)
$$
q(x_t\mid x_{t-1}) =
\mathcal{N}\!\Big(x_t;\, \sqrt{1-\beta_t}\,x_{t-1}, \beta_t I\Big)
$$

Closed-form:
$$
q(x_t\mid x_0)=\mathcal{N}\!\Big(x_t;\, \sqrt{\bar{\alpha}_t}\,x_0,\,(1-\bar{\alpha}_t)I\Big)
$$

Where  
$$
\alpha_t = 1-\beta_t,
\qquad
\bar{\alpha}_t=\prod_{s=1}^t \alpha_s.
$$

---

### Reverse Diffusion
$$
p_\theta(x_{t-1}\mid x_t)=
\mathcal{N}\!\left(x_{t-1};\, \mu_\theta(x_t,t),\,\Sigma_\theta(x_t,t)\right)
$$

### Mean Prediction (Using Noise Predictor)
$$
\mu_\theta(x_t,t)
=
\frac{1}{\alpha_t}
\left(
x_t - \frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}}\,
\epsilon_\theta(x_t,t)
\right)
$$

### Noise Prediction Loss
$$
L_{\text{simple}}
=
\mathbb{E}_{x_0,t,\epsilon}\left[ \|\epsilon - \epsilon_\theta(x_t,t)\|^2 \right]
$$

---

### Reverse SDE (Score-Based Diffusion)
$$
dx=
\Big[
f(x,t)
-
\frac{g(t)^2}{2}\nabla_x \log p_t(x)
\Big]dt
+
g(t)\, d\bar{w}
$$

Where the learned **score** is:
$$
s_\theta(x,t)=\nabla_x \log p_t(x)
$$

---

### Sampling by Reverse SDE (Langevin-Type)
$$
x_{t-\Delta t}
=
x_t
+
\Big[
f(x_t,t)
-
\frac{g(t)^2}{2}\,s_\theta(x_t,t)
\Big]\Delta t
+
g(t)\sqrt{\Delta t}\,z
$$

---

# Ultra-Short Summary of Key Equations

| Model | Core Equation |
|-------|----------------|
| Hopfield | $$E=-\frac12\sum w_{ij}s_i s_j$$ |
| Boltzmann Machine | $$P(s)=\frac{1}{Z}e^{-E/T}$$ |
| RBM | $$E=-a^{\top}v - b^{\top}h - v^{\top}Wh$$ |
| DBN | $$P(v,h)=\text{RBM} \times \text{BM}$$ |
| Diffusion | $$q(x_t\mid x_0)=\mathcal{N}(x_t;\sqrt{\bar{\alpha}_t}x_0,(1-\bar{\alpha}_t)I)$$ |

