# https://web.stanford.edu/class/cs379c/archive/2012/suggested_reading_list/documents/LeCunetal06.pdf

# https://www.cs.toronto.edu/~fritz/absps/cogscibm.pdf

1. Who First Used “Energy” in a Generative Neural Network?

The first use of “energy” as a generative model in neural networks is:

Geoffrey Hinton and Terrence Sejnowski (1985)  
Paper: “A Learning Algorithm for Boltzmann Machines”

Before this paper:

Neural networks were discriminative only (perceptrons, early MLPs).

No generative probabilistic interpretation existed.

No link to physics (energy, temperature, Boltzmann distribution).

Hinton introduced:

Energy function E(x, h)

Boltzmann distribution

$$
p(x,h)=\frac{e^{-E(x,h)/T}}{Z}
$$

Temperature T (from statistical physics)

Partition function Z

Sampling via Gibbs sampling and simulated annealing

Learning via minimizing energy of data vs. increasing energy of non-data states

This is the birth of energy-based generative models.

Conclusion:  
Hinton is the pioneer of energy-based generative modeling.  
All later EBMs—including LeCun’s—extend Hinton’s foundational idea.

2. What Does the LeCun Paper Do?

File: “A Tutorial on Energy-Based Learning” (2006)

The LeCun tutorial is not introducing energy-based models.  
It is doing something different:

It generalizes the idea of “energy”

from probabilistic generative physics-based models  
to non-probabilistic, flexible, discriminative learning machines.

LeCun removes the dependency on:

Probability distributions  
Temperature  
Partition functions  
Sampling  
Normalization  
Z (intractable integrals)

LeCun expands energy to any model that assigns:

Low energy = compatible  
High energy = incompatible

The tutorial introduces:

Energy-based inference  
Energy shaping  
Margin-based losses  
Contrastive losses  
Perceptron and hinge losses  
Structured prediction with energies  
CRFs, max-margin Markov networks  
Graph-transformer networks  
Distances and compatibility functions  
Sequence labeling  
Factor graphs  
Many architectures, not just generative ones

Key point:  
LeCun’s EBMs do not require probabilities.  
They are a general framework.

3. The Fundamental Differences Between the Two Papers

Here is the precise scientific comparison.

Difference 1: Purpose of the Model

Hinton (1985)

A probabilistic generative model.

Goal: learn full joint distribution over visible and hidden units.

Sampling: Gibbs and stochastic simulated annealing.

Motivation: biology and statistical physics.

LeCun (2006)

A general mathematical framework for learning with energies.

Does not need probabilities.

Does not require tractable Z.

Useful for classification, ranking, structured prediction, regression.

Summary:  
Hinton = generative probability model  
LeCun = general energy framework (probabilistic and non-probabilistic)

Difference 2: Role of Energy

Hinton (1985): Energy defines a probability.

$$
p(x)=\frac{e^{-E(x)/T}}{Z}
$$

Low energy → high probability.

Energy is literally negative log probability times temperature.

LeCun (2006): Energy is compatibility, not probability.

low energy = good answer (no need for normalization)

Energy is not tied to probability unless you choose to convert it.

Summary:  
Hinton = energy → probability  
LeCun = energy → scoring function

Difference 3: Temperature

Hinton:

Temperature T is critical:

Used during annealing  
Controls stochasticity  
Comes from physics  
Helps escape local minima  
Governs the Gibbs distribution

LeCun:

Temperature is optional.  
Only appears if converting energy → probability.  
Not required in most EBMs.

Summary:  
Hinton = temperature is essential  
LeCun = temperature is optional or irrelevant

Difference 4: Training Method

Hinton (1985):

Training uses:

Gradient of log-likelihood  
Two expectations (positive phase and negative phase)  
Requires Markov Chain Monte Carlo  
Requires sampling (Gibbs)  
Computes equilibrium statistics  
Difficult to scale

This is the ancestor of Contrastive Divergence, RBMs, DBNs.

LeCun (2006):

Training uses:

Margin losses  
Perceptron loss  
Contrastive losses  
Hinge loss  
Log loss  
Structured losses

No sampling required

Inference by minimization, not sampling

Summary:  
Hinton = generative training via sampling  
LeCun = discriminative or general training via energy shaping

Difference 5: Type of Tasks

Hinton:

Generative modeling  
Density modeling  
Missing data prediction  
Sampling and reconstruction  
Unsupervised learning

LeCun:

Classification  
Ranking  
Detection  
Sequence labeling  
Structured output  
Optimization problems  
CRFs, MMNs, graph models

Summary:  
Hinton = generative  
LeCun = general-purpose machine learning

Difference 6: Normalization

Hinton:

Requires computing or approximating Z, the partition function.

LeCun:

EBMs remove the normalization requirement.

This is the largest conceptual shift.

Final Answer

Who first used energy in a generative neural network?  
Geoffrey Hinton (1985), Boltzmann Machine.

What is the difference between the two papers?

| Topic | Hinton (1985) | LeCun (2006) |
|-------|----------------|----------------|
| Model type | Probabilistic generative model | General energy-based framework |
| Energy meaning | Negative log-probability | Compatibility score |
| Temperature | Essential | Optional |
| Partition function Z | Required | Not required |
| Learning | Stochastic sampling and likelihood | Direct energy shaping and loss functions |
| Tasks | Generative modeling | Classification, ranking, structured prediction |
| Architecture | Boltzmann Machine | Many architectures (CRFs, neural nets, GTN, etc.) |
