---
title: "The Boltzmann Equation - 4. Maximum Entropy"
author: "Daniel J Smith"
date: "2024-04-04"
categories: [Mathematics, Probability Theory, Information Theory, Boltzmann Equation]
title-block-banner: false
image: 'preview.png'
draft: false
description:  "We prove Cover's theorem from 'Elements of Information Theory' that the distribution function maximising the entropy over functions with given moment contraints takes the form of an exponential function. This result, combined with the H-theorem, provides a rigorous justification for our physical belief that the limiting form of a solution of the Boltzmann equation is a Maxwell-Boltzmann distribution."
---


**Table of contents**<a id='toc0_'></a>    
1. [Maximum entropy](#toc1_)    
1.1. [Theorem 1.1](#toc1_1_)    
1.2. [Remark 1.2](#toc1_2_)    
1.3. [Remark 1.3](#toc1_3_)    
1.4. [Remark 1.4](#toc1_4_)    
2. [References:](#toc2_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=true
	minLevel=1
	maxLevel=5
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# 1. <a id='toc1_'></a>[Maximum entropy](#toc0_)

Consider the following problem as posed by Cover[1]:

> Maximise the differential entropy $h(f)$ over all densities $f$ satisfying
> 
> 1.  $f(x)\geq0$ with equality outside of a given set $S$. <br/><br/>
>
>
> 2.  $\int_Sf(x)\,\text{d}x = 1$ <br/><br/>
>
> 
> 3.  $\int_S f(x)r_i(x)\,\text{d}x = \alpha_i$ 
>
>     for given functions $r_i$,
>        $\,i = 1,\dots,m.$

That is, we wish to maximise the entropy over all probability
distributions supported on the set $S$ satisfying the $m$ given moment
constraints $\mathbb{E}[r_i(X)] = \alpha_i, \,\,i = 1,\dots,m.$

It is natural to conjecture that the solution to this
optimization problem takes the form of an exponential function. Indeed,
we prove this in the following theorem by using the [Information Inequality](https://danieljamessmith.github.io/blog/posts/bte3/#theorem-2.2.3---information-inequality):

---

## 1.1. <a id='toc1_1_'></a>[Theorem 1.1](#toc0_)
For $x\in S$ define

$$f^*(x) = \exp{\left[\lambda_0 + \sum_{i=0}^m \lambda_i r_i(x)\right]}$$

where $\lambda_0, \lambda_1,\dots,\lambda_m$ are chosen such that 

$$\int_Sf^*=1,\quad\int_Sf^*r_i=\alpha_i.$$

Then $f^*$ uniquely maximises the entropy $h(f)$ over all densities $f$ satisfying conditions 1, 2 and 3 as stated above.

> *Proof.*
> 
>
> Let $g$ satisfy conditions 1, 2 and 3. Then:
>
> \begin{align*}
> h(g) &= -\int g\log g\\
> &= -\int g\log\frac{g}{f^*}f^*\\
> &= -D(g\,||\,f^*) -\int g\log f^*\\
> &\leq -\int g\log f^*\\
> &= -\int g\left(\lambda_0 + \sum_{i=1}^m\lambda_ir_i\right)\\
> &= -\int f^*\left(\lambda_0 + \sum_{i=1}^m\lambda_ir_i\right)\\
> &= -\int f^* \log f^*\\
> &= h(f^*)
> \end{align*}
>
> in which we have equality iff we have equality in the information equality. i.e. iff $g = f^*$ a.e.
>
>◻

---

## 1.2. <a id='toc1_2_'></a>[Remark 1.2](#toc0_)

The maximum entropy can be infinite. For example, consider
$S = \mathbb{R}$ with constraint $\mathbb{E}[X] = \mu$ some fixed $\mu\in\mathbb{R}.$ Then Gaussian
distributions $X \sim\mathcal{N}(\mu,\sigma^2)$ 
satisfy the constraint for any variance $\sigma^2>0$ . By
[Example 2.1.2](https://danieljamessmith.github.io/blog/posts/bte3/#example-2.1.2---entropy-of-a-univariate-normal-distribution) from a previous post we have
$$h(X) = \frac{1}{2}\log2\pi e\sigma^2 \xrightarrow{\: \sigma^2 \to \infty \: }\infty.$$

In words, we can construct probability densities on $\mathbb{R}$ with arbitrarily
large differential entropy satisfying a first moment constraint $\mathbb{E}[X] = \mu$ by
considering Gaussian distributions $X \sim\mathcal{N}(\mu,\sigma^2)$ with fixed mean $\mu$ and increasing
variance $\sigma^2$.

---

## 1.3. <a id='toc1_3_'></a>[Remark 1.3](#toc0_)

Even if the maximum entropy is finite it need not be attained. That is,
the constants $\lambda_i$ introduced in
[Theorem 1.1](#toc1_1_) need not exist. 

For example, consider
probability densities $f$ on $S = \mathbb{R}$ with moment constraints up to
third order:
 $$\begin{aligned}
\int_{-\infty}^\infty f(x) \,\text{d} x &= 1,\\
\int_{-\infty}^\infty x^if(x)\,\text{d} x &= \alpha_i,\quad i=1,2,3.
\end{aligned}$$ 

Then by
Theorem 1 the maximum entropy distribution (if it exists)
looks like

$$f(x) = \exp\left[\lambda_0 + \lambda_1x + \lambda_2x^2 + \lambda_3x^3\right].$$

However, $f\in L^1(\mathbb{R})$ only if $\lambda_3 = 0.$ Then we have
four equations in three unknowns and thus it is in general not possible
to determine the $\lambda_i$. 

The failure of our technique in this case
is simply explained, although $\sup h(f) <\infty$ there is not a
probability density $f$ satisfying our constraints that achieves this
supremum. 

To see that $\sup h(f) <\infty$ note that without the third
moment constraint the maximum entropy distribution would be
$\mathcal{N}(0,\alpha_2-\alpha_1^2).$ Adding a further moment constraint
could not increase the maximum entropy but could cause the supremum to
no longer be achievable.

We can however get arbitrarily close to the supremum by perturbing
$\mathcal{N}(0,\alpha_2-\alpha_1^2)$ at sufficiently large $x$ to force
the third moment constraint to hold without violating the first and
second moment constraints. $$\begin{aligned}
\Longrightarrow \sup h(f) &= h(\mathcal{N}(0,\alpha_2-\alpha_1^2))\\
&= \frac{1}{2}\log\left[2\pi e (\alpha_2-\alpha_1^2)\right].
\end{aligned}$$ This illustrates that *the maximum entropy may only be
$\epsilon$-achievable.*


---

## 1.4. <a id='toc1_4_'></a>[Remark 1.4](#toc0_)

The Maxwell-Boltzmann distribution $M^f$ associated to a particle distribution function $f$ takes the form of the maximum entropy distribution $f^*$ given in [Theorem 1.1](#toc1_1_) with respect to moment contraints corresponding to fixed total energy and fixed particle number.

Boltzmann's $H$-theorem (see a previous post) is an analytical assertion of the fact that the entoropy $h(f)$ of such a distribution $f$ is a quantity increasing with time. 

Thus [Theorem 1.1](#toc1_1_) combined with the $H$-theorem provide a rigorous mathematical underpinning for our physical intuition that the Maxwell-Boltzmann distribution $M^f$ should be the candidate limit of a particle distribution function $f$ as $t\rightarrow\infty$, since this intuition aligns with the second law of thermodynamic's assertion that the entropy of an isolated system will increase until it reaches a maximum at equilibrium.

---

# 2. <a id='toc2_'></a>[References](#toc0_)

- [1] Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999