# test problems

## Question 1: Maximum Entropy Probability Assignment

### Learning objectives
In this question you will:

- understand how probabilities are assigned when relevant information is unknown
- apply the principle and mathematical framework to a simple system
- extend the concept to other definitions of entropy within the simple system using numerical tools




Recall that the _Principle of Maximum Entropy_ provides a fundamental rationale for assigning prior probability distributions in statistical mechanics.  By using variational principle involving the constrained maximum of the Shannon entropy, we ensure that the distribution agrees with everything relevant we do know, but do not pretend to information we do not know.

In particular,  suppose we require a categorical probability distribution $p_1, \dotsc, p_n$ over a total of $n$ discrete possibilities $x_1, \dotsc, x_n$, considered mutually exclusive and exhaustive.  We therefore know that $p_j \geq 0$ for all $j = 1, \dotsc, n$, while $\sum\limits_{j=1}^{n} p_j = 1$.  In addition, we somehow know (or assume) the values of one or more expectation values,

\begin{align}
\langle f \rangle = \sum\limits_{j=1}^{n} p_j \, f(x_j)   &= \bar{f} ,\\
\langle g \rangle = \sum\limits_{j=1}^{n} p_j \, g(x_j)   &= \bar{g} ,\\
&\text{etc.}
\end{align}

Then the distribution maximizing the Shannon entropy
$$
S = - \sum\limits_{j=1}^n p_j \log p_j
$$
subject to the normalization and average-value constraints is of the "generalized canonical" form
$$
p_j = \frac{ e^{- \lambda f(x_j) - \mu g(x_j) + \dotsb} }{ Z(\lambda, \mu, \dotsc) } ,   \;\; j = 1, \dotsc, n
$$
where the _partition function_
$$
Z(\lambda, \mu, \dotsc) = \sum\limits_{j= 1}^{n} e^{- \lambda f(x_j) - \mu g(x_j) + \dotsb }
$$
ensures proper normalization, while the Lagrange multipliers $\lambda, \mu, \dotsc$ are chosen so as to satisfy the constraints on the expectation values:

\begin{align}
-\tfrac{\partial}{\partial \lambda} \ln Z &= \langle f \rangle = \bar{f} ,\\
-\tfrac{\partial}{\partial \mu} \ln Z &= \langle g\rangle = \bar{g} , \\
&\text{etc.}.
\end{align}


NOTE:  many sources suggest determining the Lagrange multipliers by finding the simultaneous solutions of these equations.  But generally, it is not a good idea to optimize a function just by looking for the zeros of the gradient—root-finding algorithms tend to be less efficient and less robust than function-minimization algorithms, because if nothing else works, the latter can always make progress by moving downhill.  It turns out that for fixed values of $\bar{f}, \bar{g}, \dotsc$, the auxiliary function $\Psi(\lambda, \mu, \dotsc) =  \ln Z(\lambda, \mu, \dotsc) + \lambda \bar{f} + \mu \bar{g} + \dotsb$ will be _minimized_ (not maximized, surprisingly...) at the desired values of $\lambda, \mu, \dotsc$. But then the maximized value of the entropy is given by 
$$
S(\bar{f}, \bar{g}, \dotsc) = \ln Z(\lambda, \mu, \dotsc) + \lambda \bar{f} + \mu \bar{g} + \dotsb,
$$
with the Lagrange multipliers so chosen.


### 1a. 

As an example, consider throws of one, possibly imbalanced, but otherwise standard die,
with the $j$th side marked with $x_j = j$ pips for $j = 1, \dotsc, 6$.

If the die is "fair," then the expected value of the next throw would be $\bar{x} = \tfrac{1}{6}( 1 + \dotsb + 6) = 3.5$  Instead, suppose that $\bar{x} = 4.5$.  Obviously, the die (and/or the throwing mechanism) is weighted in favor of higher numbers, but by how much?

If this is all we know, the best we can do is use the maximum entropy distribution agreeing with this average.

Find the normalized, maximum-entropy distribution $p_1, \dotsc, p_6$ corresponding to the average $\bar{x} = 4.5$.

In [None]:
#Write your answer here

### 1b. 



What is the entropy of this distribution?  How does the entropy compare to the case of the fair die?


In [None]:
#Write your answer here

### 1c. 

What is the standard deviation $\sigma_{x}$ of the number of pips, given this maximum-entropy distribution? Compare to a fair die.

In [None]:
#Write your answer here

### 1d. 

Other variational principles have been suggested (albeit with questionalbe justification) for assigning probability distributions.

Find the distribution $p_1, \dotsc, p_6$ with maximum variance amongst all distributions with $\bar{x} = 4.5$.
(Make sure the probabilities are also nonnegative and normalized).

In [None]:
#Write your answer here

### 1e. 

What is the entropy of this distribution? In, say, $10\,000$ throws of the die, how many more ways are there to arrive at frequencies near the maximum entropy probabilities, compared to frequencies near the maximum-variance probabilities (as a factor)?

Of course, maximizing variance cannot possible work as a general variational principle, if variance itself might be a constraint....

In [None]:
#Write your answer here

### 1f. 



Find the distribution $p_1, \dotsc, p_6$ with maximum value of $-\log \Bigl[ \sum\limits_{j=1}^{6} p_j^2 \Bigr]$, amongst  all (non-negative and normalized) distributions with $\bar{x} = 4.5$.  This quantity is known as the _collision entropy_, or as the order-$2$ Rényi entropy.  Note that, unlike the case of Shannon entropy, non-negativity is not automatically guaranteed, but must be imposed as an explicit constraint.

In [None]:
#Write your answer here

### 1g. 

What is the Shannon entropy of the resulting distribution? In, say, $10\,000$ throws of the die, how many more ways are there to arrive at frequencies near the maximum-entropy probabilities, compared to frequencies near the maximum-collision-entropy probabilities?


In [None]:
#Write your answer here

### 1h. 

Finally find the distribution $p_1, \dotsc, p_6$ with maximum value of the so-called _min-entropy_ $\min\limits_j \log \tfrac{1}{p_j}$, amongst  all (non-negative and normalized) distributions with $\bar{x} = 4.5$. 

In [None]:
#Write your answer here

### 1i. 

 

What is the Shannon entropy of the resulting distribution? In, say, $10\,000$ throws of the die, how many more ways are there to arrive at frequencies near the maximum-entropy probabilities, compared to frequencies near the maximum-min-entropy probabilities?

In [None]:
#Write your answer here

### 1j. 

Make a plot of the three entropies as a function of $\bar{x}$.

In [None]:
#Write your answer here