In [1]:
'''
Import here useful libraries
Run this cell first for convenience
'''
import numpy as np
from scipy import stats
import scipy
import warnings
warnings.simplefilter('ignore', DeprecationWarning)

# Statistical Estimation and Sampling Distributions

## Point estimates

### Parameters

- Parameters: term used in statistical inference for a quantity $\theta$ determining the shape of an unknown probabiity distribution
    - Goal: estimate the unknown parameters to obtain the distribution

### Statistics

- Statistic: function of a random sample (e.g. sample mean, variance, quantile...)
- Statistics are random variables whose observed values can be calculated from a set of observed data

### Estimation

- Estimation: procedure of "guessing" properties of the population from which data are collected
- Point estimate: statistic $\hat{\theta}$ representing a "best guess" of the real $\theta$ value

## Properties of Point Estimates

### Unbiased Estimates

- Unbiased point estimate: a $\hat{\theta}$ for a parameter $\theta$ satisfying:
\begin{equation}
    E(\hat{\theta}) = \theta
\end{equation}
- Bias definition:
\begin{equation}
    bias(\hat{\theta}) = E(\hat{\theta})-\theta
\end{equation}


- Point estimate of a population mean: given a random sample $X_1, \cdots, X_n$ from a distribution with mean $\mu$, the sample mean $\bar{X}$ is an unbiased estimate of $\mu$
- Point  estimate of a population variance: given a random sample $X_1, \cdots, X_n$ from a distribution with variance $\sigma ^2$, the sample variance $S ^2$ is an unbiased estimate of $\sigma ^2$

### Minimum Variance Estimates

- Minimum variance unbiased estimate: unbiased point estimate whose variance is smaller than any other unbiased point estimate


### Relative efficiency

- Relative efficiency: of an unbiased point estimate $\hat{\theta}_1$ to another $\hat{\theta}_2$ 
\begin{equation}
    \frac{Var(\hat{\theta}_2)}{Var(\hat{\theta}_1)}
\end{equation}

### Mean squared error (MSE)

\begin{equation}
    MSE(\hat{\theta}) = E( \hat{\theta} - \theta) ^2
\end{equation}
- Alternative form: 
\begin{equation}
    MSE(\hat{\theta}) = Var( \hat{\theta}) + bias^2 (\hat{ \theta })
\end{equation}

## Sample Proportion

### Sample Proportion

- If $X \sim B(n,p)$ then the sample proportion $\hat{p} = \frac{X}{n}$ has approximately the distribution $N(p, \frac{p(1-p)}{n})$
- Standard error of $\hat{p}$:
\begin{equation}
    s.e.(\hat{p}) = \sqrt{ \frac {p(1-p)}{n}}
\end{equation}
When $n$ is large, then $s.e.(\hat{p})$ is approximated by $\sqrt{ \frac{ \hat{p} (1 - \hat{p} )}{n}}$.

### Sample Mean

- Distribution of sample mean: given $X_1, \cdots, X_n$ a random sample from a distribution with mean $\mu$ and variance $\sigma ^2$, the centra limit theorem says:
\begin{equation}
    \bar{X} \sim N \left( \mu, \frac{\sigma^2}{n} \right) \text{ for large } n
\end{equation}

- Standard error of the sample mean: $s.e.(\bar{X}) = \frac{\sigma}{\sqrt{n}}$
- When $\sigma$ is unknown and $n$ is large, then the standard error is approximated by $\frac{s}{\sqrt{n}}$

### Sample Variance

- Distribution of sample variance: given $X_1, \cdots, X_n$ a random sample from $N( \mu, \sigma^2)$, then:
\begin{equation}
    \frac{(n-1) S^2}{\sigma^2} \sim \chi _{n-1}^2
\end{equation}

- t-statistics: given $X_1, \cdots, X_n$ a random sample from $N( \mu, \sigma^2)$, then
\begin{equation}
    T = \frac{\sqrt{n}( \bar{X} - \mu) }{S} \sim t_{n-1}
\end{equation}

## Constructing Parameter Estimates

### The Methods of Moments

- Method of Moments point estimate (MME) for one parameter: given a data set of observations $x_1, \cdots, x_n$ from a probability distribution depending on one parameter $\theta$, then the MME$(\hat{\theta}) of \theta$ is found by solving the following equation
\begin{equation}
    \bar{x} = E(X)
\end{equation}

- Method of Moments point estimate (MME) for two parameters: the unknown parameters can be found as:
\begin{equation}
    \bar{x} = E(X)
\end{equation}
and
\begin{equation}
    s^2 = Var(X)
\end{equation}

### Maximum Likelihood Estimates

- Maximum likelihood estimate for one parameter: given a data set of observations $x_1, \cdots, x_n$ from a probability distribution $f(x; \theta)$ then
\begin{equation}
    \text{MLE}(\hat{\theta}) = \max_{\theta}L(\theta) = \max_{\theta} f(x_1; \theta) \times \cdots \times f(x_m; \theta)
\end{equation}
where $L(\theta)$ is the likelihood function

- Maximum likelihood estimate for two parameters: $\theta_1$ and $\theta_2$ are the values of the parameters at which the likelihood function is maximized 

### MLE for $U(0, \Theta)$

- For some distribution, the MLE may not be found by differentiation and we should look at the curve of the likelihood function itself
- MLE of $\theta = \max \{ X_1, \cdots, X_n \}$