# SA-DISCnet Statistics: Exercises (Challenging version)


Import python modules

In [1]:

import corner
import emcee
import math
import mpmath
import numpy as np
import pdb
import matplotlib.pyplot as plt
import scipy
import scipy.special
from scipy import optimize

# Q1 Maximum likelihood and Bayes example: exponential distribution

(a) Suppose a set of events has a distribution of times taken for the events, and you are trying to determine the mean time for this distribution. The distribution has $p(t) \propto e^{-t/\tau}$. You measure $N$ event times $t_i$. If there is no limit on times that can be measured, the PDF for measured separation time $t$ is $$p(t) = \frac{e^{-t/\tau}}{\tau}.$$ Show that this correctly normalises $p(t)$.

Use the maximum likelihood method to show that estimates of $\tau$ and its variance are
given by  $$\widehat\tau = \frac{1}{N} \sum_i t_i$$ and $$V(\tau) = \frac{\widehat{\tau}^2}{N}.$$


(b) Now suppose that you cannot measure any times longer than $T$. The truncated
PDF (normalised to 1) is then $$p(t) =\frac{e^{-t/\tau}}{\int_0^T e^{-t/\tau} dt} =\frac{1}{\tau}e^{-t/\tau}(1-e^{-T/\tau})^{-1}.$$ By differentiating the log likelihood $l$ with respect to  $\tau$ and setting it to zero, show that the maximum-likelihood estimate of $\tau$ is given by $$\widehat{\tau }=\frac{1}{N}\sum t_{i}+ \frac{Te^{-T/\widehat{\tau }}}{\left( 1-e^{-T/\widehat{\tau }}\right) }. \,\,\,(*)$$



(c) Write a computer script to generate $N$ = 1000 times from the original PDF, 
with  $\tau$= 10 s with a maximum of $T$ = 15 s. Plot a histogram of the times you have
generated.

(d) Use equation (*) to estimate $\widehat\tau$ from your generated data. Is this consistent with the true
value, given the simplified estimate of variance on your estimate?



(e) Suppose that a previous experiment has estimated the mean time to be $\widehat\tau=\tau_p\pm\sigma$. 
Write down an expression for the posterior likelihood after measuring $N$
times $t_i < T$, including the prior information, and maximise it with respect to $\tau$.




What maximum likelihood values do you obtain for $\widehat\tau$ for prior estimate $\tau_p$ = 8 s, $\sigma$ = 1 s,
and with computer generated data with $\tau$ = 10 s, $T$ = 15 s for different sample sizes $N$ =
10, 100, 1000 and 10000? Comment on your results.




## Q2 Bayes’ theorem example: constrained measurements

A small quantity of powder is being weighed on a balance which gives a reading of $x\pm\sigma$ g.
Assuming a uniform prior on the true mass $X$ for any positive value, $p(X > 0)$ = const, zero
otherwise, show that the posterior likelihood for the true mass is given by
$$p(X|x) = \sqrt{\frac{2}{\pi}}\frac{1}{\sigma}\frac{e^{-(x-X)^2/2\sigma^2}}{\mathrm{erfc}(-x/\sqrt{2}\sigma)} $$
for X > 0, zero otherwise.

Plot $p(X|x)$ for $X$ between 0 and 1 g for $\sigma$ = 0.2 g and for mass readings $x$ = -0.3, -0.1, 0.1, 0.3 g
(plot a separate line for each of the four readings).



## Q3 MCMC example

In the first part of this question, you will write a Monte Carlo program to simulate a sample drawn from a particular parameterised PDF. In the second part, you will use the simulated data that you have generated, and do a maximum likelihood analysis to determine the parameters of the model. If
you are successful, your analysis should yield the initial input parameters - always a useful
check to perform when you are developing Monte Carlo routines!

(a) Event Generation.

i) The probability distribution function for detecting a mass $m$ in a particle physics experiment is 

$$p(m) = A \frac{\Gamma^2/4}{(m-m_0)^2+\Gamma^2/4}$$

By changing variables to $x = 2 (m - m_0)/\Gamma$, integrate over $m$ to normalise the distribution, and thus prove that
$$A=\frac{2}{\pi \Gamma}.$$
You will need to assume that $m_0\gg\Gamma$ 􀀀 in order to put sensible limits on your integral.



ii) Now, given the random number generator U (0, 1) provided with any computer language, we want to generate a sample from this distribution. The best way is the transformation method; show that the transformation
$$m = m_0 + \frac{\Gamma}{2}\tan[(x-1/2)\pi]$$
􀀀
generates the distribution as required. [x is a random number generated from U (0, 1) ].


iii) Generate 10,000 masses $m$ in this way, with $m_0$ = 784 MeV, 􀀀 $\Gamma$ = 12 MeV, and put
them into a histogram with, say, 50 bins covering the mass range $m$ =760-810 MeV.

Plot this histogram.


(b) Now, imagine that we have carried out an experiment with a sample following this distribution. We shall use the Maximum Likelihood Method to find the best
estimate of $m_0$ and $\Gamma$. Note that the MLM does
not require you to bin the data in histograms - you can work with each "raw" event.

i) Using the normalised distribution, give
an expression for the log likelihood for $N$ events. Write down the equations that must
be satisfied for maximum likelihood. These are a nasty pair of implicit equations
which you don't need to solve!


ii) Plot log likelihood vs. mass $m_0$ in the mass range 760-810 MeV, keeping $\Gamma$􀀀 constant at 12 MeV.

See the code in breit_wigner() below.



iii) Plot log likelihood vs. $\Gamma$ 􀀀 in the range 8-16 MeV, keeping $m_0$ constant at 784
MeV. 


iv) The variables $m_0$ and $\Gamma$􀀀 are correlated, and to find the global maximum you generally
need to iterate, taking it in turns to adjust each of the two variables and find the
peak in the other. (Remember, for the plots you drew above, you already "knew" the
answer for the complementary variable in each case, which is why they peaked in the
right place straight away).

Use Markov Chain Monte Carlo to explore the two-dimensional parameter space.
Make plots of (i) the joint distribution in $m_0$ and $\Gamma$􀀀 and (ii) marginalised distributions in $m_0$ and $\Gamma$􀀀 separately.

Use emcee to explore the parameter space.
See http://dfm.io/emcee/current/ for documentation.

## Q5 Generating arbitrarily distributed random numbers

Utilising the uniform random number generator from your favourite programming language,
write functions that will efficiently return $N$ random numbers corresponding to (a) a supplied
empirical distribution ${x_i, p_i(x_i)}$, (b) a user-supplied 1-d PDF $p(x)$ and (c) a user-supplied
2-d PDF $p(x, y)$.
