# **Applied Statistics Worksheet 2:  Parameter estimation**

***SEMT20002 â€” Nikolai Bode***

This worksheet focusses on the material covered in Lecture 2: Methods to estimate the parameter values or quantities of a probability distribution. Quite a few elements of this worksheet are non-examinable (sections 2-4).

The instructions for what you are expected to do are either shown in the text or given as comments in the code boxes.

## **Packages**

Whilst packages are mentioned (and used in the solutions provided), please do not simply run code to import these packages, in case some import packages are not compatible with what you have already installed.

From solutions, only copy the relevant lines into your own file and then execute, checking functionality with existing installed packages, and only importing new packages when needed. For plotting results in particular, we suggest you focus on using functions you know already rather than trying texactlyly replicate the plots shown in lectures or in solution

For this worksheet, the functionalities required are contained within `scipy.stats` and `scipy.optimize`.


## **1. Maximum Likelihood (ML)**
**ML estimate for the parameter of a Bernoulli distribution**

Estimate the parameter $ p $ of a Bernoulli distribution from a sample of $ n = 10 $ observations using the ML method:
1. Generate a sample $ x $ of size $ n = 10 $ from a Bernoulli distribution with parameter $ p = 0.75 $ (as for Worksheet 1, you can use stats package to make distributions).
2. Compute and plot the Likelihood function for the sample $ x $:    $L(p; x) = \prod_{i=1}^{n} P(x_i; p).$ For this, it is convenient to define a function, e.g. `likelihood = lambda p: p*x`.
3. Compute and plot the log-Likelihood function for the sample $ x $:    $\log(L(p)) = \sum_{i=1}^{n} \log P(x_i; p).$
4. Use Scipy's `fmin` to find $ p_L $, the value of $ p $ that maximises $ L(p) $, and $ p_{\log L} $, the value of $ p $ that maximises $ \log(L(p)) $ (note that changing the sign of a function turns a maximisation into a minimisation problem).
5. Compute the analytical ML estimate for the sample $ x $:     $ \hat{p}_{MLE} = \frac{1}{n} \sum_{i=1}^{n} x_i.$
   Does $ \hat{p}_{MLE} = p_L = p_{\log L} $?    In general, $ \hat{p}_{MLE} \neq p $, the true parameter value.
   Why?
6. Increase the sample size $ n = 100, 1000, \ldots $ and observe the likelihood functions and the estimates of $ p $.
   What do you notice?
7. Check that Scipy stats `fit` function gives the same result you found.

Sometimes `fmin` may not converge and it returns wrong estimates. Starting from a different initial condition can solve the problem. In other cases this could be due to numerical errors, for example when the function to be optimised has values beyond  **Python**'s numerical precision -- this can happen for Likelihood functions for large samples (why?).

### **ML estimate for the parameters of a Normal distribution**

Follow the same steps as above to estimate the parameters $\mu$ and $\sigma$ of a Normal distribution from a sample of $n = 10$ observations using the ML method:

1. Generate a sample $x$ of size $n = 10$ from a Normal distribution with parameters $\mu = 0$ and $\sigma = 1$.
2. Define the Likelihood and log-Likelihood functions for the sample $x$ as a function of parameters $\mu$ and $\sigma$ ( `stats.norm(mu, sigma).pdf(x)` gives the value of the Normal PDF at $x$).
3. Create a grid of $\mu$ and $\sigma$ values and compute the liklihood function for each normal distribution ($\mu_i$ and $\sigma_i$)
4. Use matplotlib's `contourf(X, Y, Z)` to plot the Likelihood function for the sample $x$, $L(\mu,\sigma; x) = \prod_{i=1}^{n} P(x_i; \mu,\sigma)$.
4. Use Python's `fmin` to find $(\hat{\mu},\hat{\sigma})$, the values of the parameters that maximise $L(\mu,\sigma)$.
5. Compute the analytical ML estimate $(\hat{\mu}_{MLE},\hat{\sigma}_{MLE})$. Is $(\hat{\mu}_{MLE},\hat{\sigma}_{MLE}) = (\hat{\mu},\hat{\sigma})$? Why?
6. Increase the sample size $n = 100, 1000, ...$ and observe the likelihood functions and the estimates of $\mu$ and $\sigma$. What do you notice?
7. Check that Scipy stats `fit` function gives the same result you found.


## 2. **Method of Moments (MoM) *(non-examinable)***

MoM estimate for the parameters of a Poisson distribution

1. Calculate $\hat{\lambda}_{MoM}$ the method of moments estimator for the parameter $\lambda$ of a Poisson distribution with PMF $P(x;\lambda) = e^{-\lambda}\frac{\lambda^x}{x!}$, with $x = 0, 1, ...$ and $\lambda > 0$.

2. Generate a sample $x$ of $n = 10$ observations from a Poisson distribution with parameter $\lambda = 10$ and estimate it using $\hat{\lambda}_{MoM}$, the estimator derived in 1.

3. Consider increasing sample sizes $n = 100, 1000$ and for each $n$ draw 1000 samples, compute their 1000 MoM estimates $\hat{\lambda}_{MoM}$, and calculate the estimates' standard deviation, $\sigma_{\hat{\lambda}}$. Plot $n$ against $\sigma_{\hat{\lambda}}$ and verify that the "spread" of the estimates decreases as the square root of $n$, as the sample size increases.



## **3. Unbiased Estimators *(non-examinable)***

Consider the RV $ X $ with the PDF $ P(X; \sigma) = \frac{1}{2\sigma} e^{-\frac{|x|}{\sigma}} $ for $ x \in \mathbb{R} $.

1. Calculate $ \hat{\sigma}_{MLE} $, the Maximum Likelihood estimator of $ \sigma $.
2. Calculate $ \hat{\sigma}_{MoM} $, the Method of Moments estimator of $ \sigma $.
3. Do the two methods give the same estimator?
4. Check numerically if $ \hat{\sigma}_{MLE} $ and $ \hat{\sigma}_{MoM} $ look unbiased, asymptotically unbiased or biased: draw 20000 samples of various sizes, $ n = [5, 10, 100, 1000] $, and plot the histograms and means of the distributions of the estimates for the different $ n $. Is there a better estimator? To generate $ n $ random numbers from the PDF $ P_X(x; \sigma) $, use the code `x = stats.expon(scale=sigma).rvs(n) * (2 * stats.binom(1, 0.5).rvs(n) - 1)` (why?).


## **4. Comparison of estimators for the mean of non-skewed and skewed distributions *(non-examinable)***

Use simulations to investigate whether the mean or median is a better estimator of the mean (parameter $\mu$) of a Normal distribution. Compare the two estimators by analysing (1) bias and (2) standard deviation (spread) of the estimates.
