# The Role of the Vector Field in Score-Based / Diffusion Models

---

## 1. What “Vector Field” Means?

The learned object is

$$
s_\theta(x,t) \approx \nabla_x \log p_t(x)
$$

This is a **vector-valued function**:

- **Input:** a point $$x \in \mathbb{R}^d$$ (and optionally time or noise level $t$ )

- **Output:** a vector in $$\mathbb{R}^d$$

So at every point in space (and time), the model outputs:

a direction and a magnitude telling you where probability mass increases fastest.

That is the definition of a **vector field**.

---

## 2. Why the Model Must Be a Vector Field (Not a Scalar)

### Classical density models

Try to learn a scalar function

$$
p(x)
$$

Must integrate to 1, which leads to a normalization nightmare.

### Score-based models

Learn **directional information only**.

No normalization is required.

The only requirement is:

input dimension = output dimension.

This is why Yang Song explicitly states:

“The only requirement on the score-based model is that it should be a vector-valued function with the same input and output dimensionality.”

Without the vector field:

- No gradients  
- No sampling  
- No reverse diffusion  
- No generative process at all  

---

## 3. The Vector Field Defines Probability Geometry

Think geometrically.

- High-density regions are basins  
- Low-density regions are hills  

The score vector at \( x \):

- points uphill in log-probability  
- is always orthogonal to level sets of \( p(x) \)

So the vector field:

- encodes how probability mass flows  
- defines the shape of the distribution without ever writing it down  

You never see the density.  
You only feel its force field.

---

## 4. Sampling Equals Following the Vector Field

### Langevin dynamics

$$
x_{k+1} = x_k + \epsilon \, s_\theta(x_k) + \sqrt{2\epsilon} \, \xi_k
$$

Interpretation:

- Deterministic part follows the vector field  
- Noise enables exploration  

Without the vector field:

- Noise drifts randomly  
- There is no attraction to the data manifold  

The vector field is what pulls samples from noise into structure.

---

## 5. Why Low-Density Regions Were Fatal (And Why Vector Fields Fix This)

Key failure discovered by the author:

Score matching is weighted by

$$
p(x)
$$

The vector field is poorly learned where \( p(x) \) is small.

But sampling **starts** in low-density regions.

This is why naive score fields fail.

Noise perturbations combined with time-dependent vector fields ensure:

- The vector field is well-defined everywhere  
- Sampling trajectories do not collapse  

---

## 6. Vector Field as the Drift of the Reverse-Time SDE

In the SDE framework:

### Forward process (destruction)

$$
dx = f(x,t)\,dt + g(t)\,dW_t
$$

### Reverse process (generation)

$$
dx =
\left[
f(x,t) - g(t)^2 \nabla_x \log p_t(x)
\right] dt
+ g(t)\, d\bar{W}_t
$$

The score vector field is **literally the drift correction term**.

Meaning:

- The vector field defines the generative dynamics  
- Learning the vector field is equivalent to learning how to reverse entropy  

---

## 7. Probability Flow ODE: Vector Field Without Randomness

The deterministic ODE formulation is

$$
dx =
\left[
f(x,t) - \frac{1}{2} g(t)^2 s_\theta(x,t)
\right] dt
$$

Here:

- The vector field becomes a deterministic velocity field  
- The entire generative model is a continuous flow  

This is why:

- Exact likelihoods become possible  
- The model becomes a continuous normalizing flow  

---

## 8. Inverse Problems: Vector Field as a Bayesian Update Engine

Bayes’ rule written in score form:

$$
\nabla_x \log p(x \mid y)
=
\nabla_x \log p(x)
+
\nabla_x \log p(y \mid x)
$$

So:

- Prior vector field plus measurement vector field  
- Add them to obtain the posterior vector field  
- Sample using the same machinery  

This is impossible with scalar density models without retraining.

---

## 9. The Deepest Insight (Why This Works at All)

A vector field:

- Does not need to be integrable  
- Does not need a closed-form density  
- Only needs to be locally correct  

Score matching exploits this fact.

You never reconstruct

$$
p(x)
$$

You only learn how probability wants to move.

This is why diffusion models scale to:

- \(1024 \times 1024\) images  
- Audio waveforms  
- Medical reconstruction  
- Physics-inspired generation  

---

## Final One-Sentence Truth

In score-based diffusion models, the vector field **is** the probability distribution, expressed not as “what is likely,” but as “where to move.”
