In Bayesian statistics, it is needed to be able to sample from the posterior distribution of unknown parameters given some data. In the arbitrary, high-dimensional problems, this is not possible analytically, but Markov Chain Monte Carlo methods (MCMC) have become a popular alternative.

We want to sample from a distribution $\pi(x)$, $x \in \mathbb{R}^m$, but cannot do so directly. Instead we create a discrete-time Markov chain $X(n)$ (taking values in $\mathbb{R}^m$) such that $X(n)$ has equilibrium distribution $\pi$. Then
\begin{align}
    X(n) \to X \sim \pi \text{ in distribution} &\quad \text{as $n \to \infty$} \\
    \frac{1}{N}\sum_{n=1}^N f(X(n)) \to E_\pi(f(X)) &\quad \text{as $N \to \infty$}
\end{align}
where $f$ is any real-valued function on $\mathbb{R}^m$ for which the right-hand side above is well-defined. The second of these limits can be used to calculate means and variances of components of $X$, as well as approximations to the distribution functions. For example, $f(x) = x_i$ gives the mean of the ith component $X_i$, and $f(x) = I(x_i \leq b)$ gives the distribution function of $X_i$ at point $b$.

Suppose we do not have a tractable closed-form expression for the equilibrium density $\pi(x) = \pi(x_1, \dots, x_m)$, but we do know the induced full conditional densities $\pi(x_i | x_{-i})$, where $x_{-i}$ is the vector $x$ omitting the $i$-th component.

A systematic form of the Gibbs sampler algorithm proceeds as follows. First, pick an arbitrary starting value $x^0 = (x_1^0, \dots, x_m^0)$. Then successively make random drawings from the full conditional distributions $\pi(x_i | x_{-i})$, for $i = 1, \dots, m$ as follows:
\begin{align}
    &x_1^1 \text{ from } \pi(x_1 | x_{-1}^0) \\
    &x_2^1 \text{ from } \pi(x_2 | x_1^1, x_3^0, \dots, x_m^0) \\
    &x_3^1 \text{ from } \pi(x_3 | x_1^1, x_2^1, x_4^0, \dots, x_m^0) \\
    &\vdots \\
    &x_m^1 \text{ from } \pi(x_m | x_{-m}^1).
\end{align}
This cycle completes a transition from $x^0 = (x_1^0, \dots, x_m^0)$ to $x^1 = (x_1^1, \dots, x_m^1)$. Repeating the cycle produces a sequence $x^0, x^1, x^2, \dots$ which is a realisation of a Markov chain, known as the Gibbs sampler. We call $\pi(x, y)$ the transition probability density of this Markov chain.


---

Assume that the Markov chain $X(n)$ takes values in a finite subset $S \subset \mathbb{R}^m$. To verify that $\pi$ is the equilibrium distribution for the Markov chain generated by the Gibbs sampler, we need to show that the global balance equation holds
\begin{equation}
    \sum_{x \in S} \pi(x)\pi(x, y) = \pi(y)
\end{equation}
The Gibbs sampler updates components sequentially. To move from state $x = (x_1, \dots, x_m)$ to state $y = (y_1, \dots, y_m)$, the algorithm proceeds in $m$ steps where at the $m$-th step, we draw $y_k$ from $\pi(y_k | y_1, \dots, y_{k-1}, x_{k+1}, \dots, x_m)$. The joint transition probability is the product of these conditional probabilities
\begin{equation}
    \pi(x, y) = \pi(y_1 | x_2, \dots, x_m) \cdots \pi(y_m | y_1, \dots, y_{m-1}).
\end{equation}
We substitute this product into the LHS of the balance equation and perform the summation over $\mathbf{x} = (x_1, \dots, x_m)$ sequentially. This yields
\begin{equation}
    \sum_{x_m} \dots \sum_{x_1} \pi(x_1, \dots, x_m) \left[ \prod_{k=1}^m \pi(y_k | y_1, \dots, y_{k-1}, x_{k+1}, \dots, x_m) \right].
\end{equation}
None of the terms in the transition product $\pi(x, y)$ depend on $x_1$, therefore, we can sum the joint distribution $\pi(\mathbf{x})$ over $x_1$
\begin{equation}
    \sum_{x_1} \pi(x_1, x_2, \dots, x_m) = \pi(x_2, \dots, x_m),
\end{equation}
so that the expression becomes
\begin{equation}
    \sum_{x_m} \dots \sum_{x_2} \pi(x_2, \dots, x_m) \pi(y_1 | x_2, \dots, x_m) \cdot [\text{remaining terms}].
\end{equation}
Now consider the terms involving $x_2$. We have the marginal $\pi(x_2, \dots, x_m)$ and the first transition term $\pi(y_1 | x_2, \dots, x_m)$. The remaining transition terms (for $y_2, \dots, y_m$) do not depend on $x_2$. Combine the densities using the definition of conditional probability $\pi(A|B)\pi(B) = \pi(A,B)$ to get
\begin{equation}
    \pi(y_1 | x_2, \dots, x_m) \pi(x_2, x_3, \dots, x_m) = \pi(y_1, x_2, x_3, \dots, x_m),
\end{equation}
where we now sum over $x_2$,
\begin{equation}
    \sum_{x_2} \pi(y_1, x_2, x_3, \dots, x_m) = \pi(y_1, x_3, \dots, x_m).
\end{equation}
We have replaced the dependency on $x_2$ with $y_1$ in the marginal distribution.

Assume that after summing over $x_1, \dots, x_{k-1}$, the current term is
\begin{equation}
    \pi(y_1, \dots, y_{k-1}, x_k, \dots, x_m) \pi(y_k | y_1, \dots, y_{k-1}, x_{k+1}, \dots, x_m) \cdot [\text{remaining terms}].
\end{equation}
The term $\pi(y_k | \dots)$ does not depend on $x_k$ so sum the density over $x_k$ to get the marginal
\begin{equation}
    \sum_{x_k} \pi(y_1, \dots, y_{k-1}, x_k, x_{k+1}, \dots, x_m) = \pi(y_1, \dots, y_{k-1}, x_{k+1}, \dots, x_m).
\end{equation}
Multiply this by the transition term for step $k$,
\begin{equation}
    \pi(y_k | y_1, \dots, y_{k-1}, x_{k+1}, \dots, x_m) \pi(y_1, \dots, y_{k-1}, x_{k+1}, \dots, x_m).
\end{equation}
Using the probability product rule again, this combines to
\begin{equation}
    \pi(y_1, \dots, y_{k-1}, y_k, x_{k+1}, \dots, x_m).
\end{equation}
By repeating this process for all $k$ from $1$ to $m$, we successively replace every $x_i$ with $y_i$ in the joint distribution. After the final summation over $x_m$, and multiplying by the final transition term $\pi(y_m | y_1, \dots, y_{m-1})$, we obtain
\begin{equation}
    \pi(y_1, y_2, \dots, y_m) = \pi(\mathbf{y}),
\end{equation}
as desired.

---

It can be shown that this implies that $\pi$ is the equilibrium distribution of the Gibbs sampler. Thus our estimate of $E_\pi(f(X))$, taken over $N$ iterations is
\begin{equation}
    \frac{1}{N} \sum_{n=1}^N f(x^n).
\end{equation}