---
title: Maximal Entropy
layout: collection
permalink: /Computational-Biology/Maximal-Entropy
collection: Computational-Biology
mathjax: true
toc: true
categories:
  - study
tags:
  - mathematics
  - statistics
---

### Laplace's Method

Assume we are estimating a single parameter $\mu$, and given our data $ \mathcal{D}$ we hav calculated a posterior distribution $ \mathbb{P}(\mu | \mathcal{D})$. Because we may want to integrate the posterior over some area, because the posterior can be a complicated function to integrate, if the posterior has a single peak, we can estimate it by expanding the logarithm of the posterior around the peak.

Let $\mu_*$ be the mode of the distribution, thus the peak of $ \mathbb{P}(\mu | \mathcal{D})$. 
We definde $L(\mu) = \log \mathbb{P}(\mu |\mathcal{D)$. Expanding this around $\mu_*$ with the taylor expansion gives us 

$$
L(\mu) 
\approx 
L(\mu_*) + 
\frac{\partial}{\partial \mu} L(\mu) \Bigr|_{\mu = \mu_*} (\mu - \mu_*)+
\frac{1}{2} \frac{\partial^2}{\partial \mu^2} L(\mu) \Bigr|_{\mu = \mu_*} (\mu - \mu_*)^2 + ...
$$
We only take the terms up to the second derivative as a good enough approximation. Because then $\frac{\partial}{\partial \mu} L(\mu) \Bigr|_{\mu = \mu_*} = 0$, because at $\mu_*$ the posterior has a maxima, we get 

$$
L(\mu) 
\approx 
L(\mu_*) + \frac{1}{2} L''(\mu_*) (\mu - \mu_*)^2
$$

Then setting $L''(\mu_*) = \frac{1}{\sigma^2}$ and taking the exponent we get a gaussian:

$$
\mathbb{P}(\mu | \mathcal{D}) = e^{L(\mu)} \approx e^{L(\mu_*)} e^{ - \frac{(\mu - \mu_*)}{2 \sigma^2}}
$$

Assume now that we have n parameters $ \mathbf{\alpha} = (\alpha_1, \alpha_2, ..., \alpha_n)$ and have a likelihood function $ \mathbf{P}(\mathcal{D} | \mathbf{\alpha})$. 
Using a uniform prior we get the posterior $ \mathbb{P}(\mathbf{\alpha} | \mathcal{D}) \propto \mathbf{P}(\mathcal{D} | \mathbf{\alpha})$. The value $ \mathbf{\alpha}^*$ then maximizes the posterior.

Expanding then around the logarithm we get

$$
\log \mathbb{P}(\mathbf{\alpha} | \mathcal{D}) 
= 
\log \mathbb{P}(\mathbf{\alpha}^* | \mathcal{D}) +
\sum_{i} (\alpha_i - \alpha_i^*) \frac{\partial \log \mathbb{P}(\mathbf{\alpha} | \mathcal{D})}{\partial \alpha_i } \Bigr|_{\mathbf{\alpha} = \mathbf{\alpha}^*} + 
\sum_{i,j} (\alpha_i - \alpha_i^*) (\alpha_j - \alpha_j^*)  \frac{\partial^2 \log \mathbb{P}(\mathbf{\alpha}^2 | \mathcal{D})}{\partial \alpha_i } \Bigr|_{\mathbf{\alpha} = \mathbf{\alpha}^*} + ...
$$ 

Again the first term vanishes because of the maxima at $ \mathbf{\alpha}^*$. The second derivate gives the so called hessian matrix.

$$
H_{ij} = \frac{\partial \log \mathbb{P}(\mathbf{\alpha} | \mathcal{D})}{\partial \alpha_i \alpha_j} \Bigr|_{\mathbf{\alpha}=\mathbf{\alpha}^*}
$$

If we set $B_{ij} = -H_{ij}$ we get the approximated posterior by a multivariate gaussian

$$
\mathbb{P}(\mathbf{\alpha} | \mathcal{D}) \propto \exp \left( - \frac{1}{2} \sum_{i,j}(\alpha_i - \alpha_i^*) B_{ij} (\alpha_j - \alpha_j^*) \right)
$$

The covariance matrix is then given by the inverse of the hessian. If now integrate all parameters but one, the marginal distribution again is a gaussian with variance $\sigma^2_i = B_{ij}^{-1}$:

$$
\mathbb{P}(\alpha_i | \mathcal{D}) \propto \exp \left( -\frac{1}{2} \frac{(\alpha_i - \alpha_i^*)^2}{B_{ij}^{-1} \right)
$$

### ----

Assume we have $N$ genes in total, where each gene can have $m=0,1,2,...,$ mRNA copies in a given cell. 
On average, a gene has $\bar{m}$ mRNAs per cell, i.e. there are $M=\bar{m}N$ mRNAs per cell. 
We now would like to know, what is the probability $ \mathbb{P}(m | I)$ that one particular gene has $m$ mRNAs in the cell.

1. Standard way

For each gene $g$ we specify the number $m_g$ of mRNAs that it has in the cell, i.e. the mutually exclusive and exhaustive possibilities are vectors $ \mathbf{m} = (m_1, m_2, ... , m_N)$. We can then incorporate our prior information that among these potentially possible states, our information $I$ specifies that only those states have nonzero probability for which we have:

$$
M(\mathbf{m}) = \sum_{g=1}^{N} m_g = M = N \bar{m}
$$

This we have a uniform prior which restricts our space to the space which has the previous given property

$$
\mathbb{P}(\mathbf{m} | I) = \frac{\delta(M (\mathbf{m}) - N \bar{m})}{\sum_{\mathbf{m}'} \delta(M (\mathbf{m}') - N \bar{m})}
$$

and for a single gene we have

$$
\mathbb{P}(m_g = m | I) = \frac{\sum_{\mathbf{m}} \delta(M(\mathbf{m}) - N \bar{m}) \delta(m_g - m))}{\sum_{\mathbf{m}'} \delta(M (\mathbf{m}') - N \bar{m})}
$$

For large N these calculations become difficult for large N, which is why we look at an approximate.
Instead of looking at th number of mRNAs $m_g$ that each gene $g$ has, we look at the ector with the number of genes that have precisely $m$ mRNA copies $ \mathbf{n} = (n_0, n_1, n_2, ..., n_M)$, which counts how many genes there are with 0 mRNA ($n_0$), with 1 mRNA ($n_1$) etc. 
For a given vector $ \mathbf{n}$ there are many vectors $ \mathbf{m}$. 
Let then $W(\mathbf{n})$ denote the number of vectors $ \mathbf{m}$ that all havce the same count vector $ \mathbf{n}$.
The numbers  $W(\mathbf{n})$ are given by the multinomial coefficients.

$$
W(\mathbf{n}) = \frac{N!}{n_0 ! n_1 ! ... n_M!}
$$
With our constraint $M( \mathbf{n}) = \sum_{m=1}^{\infty} mn_m = M = N \bar{m}$.Then our probability distribution becomes

$$
\mathbb{P}(\mathbf{n} | I) = \frac{W(\mathbf{n}) \delta(M(\mathbf{n}) - N\bar{m})}{\sum_{\mathbf{n}'} W(\mathbf{n}')\delta(M(\mathbf{n'}) - N\bar{m})}
$$