# 12. Estimators, Properties of Point Estimators and Methods of Estimation
<hr>

An **estimator** is a rule, often expressed as a formula, that tells us how to calculate the value of an estimate based on the measurements contained in a sample.

Let $\hat{\theta}$ be a point estimate for a parameter $\theta$. Then $\hat{\theta}$ is an unbiased estimator if $E(\hat{\theta})=\theta$. If $E(\hat{\theta}) \neq \theta$, then $\hat{\theta}$ is said to be biased.

The **bias** of a point estimator $\hat{\theta}$ is given by:

$$B(\hat{\theta}) = E(\hat{\theta}) - \theta$$

- If $B(\hat{\theta}) < \theta$, then $\hat{\theta}$ tends to underestimate $\theta$.
- If $B(\hat{\theta}) > \theta$, then $\hat{\theta}$ tends to overestimate $\theta$.

The **mean square error** of a point estimator $\hat{\theta}$ is:

$$\text{MSE} = E\left[ (\hat{\theta} - \theta)^2 \right] = V(\hat{\theta}) + \left[ B(\hat{\theta}) \right]^2$$

## 12.1 Estimating Variance
<hr>

Sample variance could be estimated as:

$$s^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})^2$$

OR...

$$s'^2 = \frac{1}{n} \sum_{i=1}^n (Y_i - \bar{Y})^2$$

Which one is an unbiased estimator?

*Note that there is no assumption on the shape of the distribution and $E(Y_i)=\mu$ and $V(Y_i)=\sigma^2$.*

$$s^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar{Y})^2$$

Where:

$$\sum_{i=1}^n (Y_i - \bar{Y})^2 = \sum_{i=1}^n Y_i^2 - n\bar{Y}$$

$$E\left[ \sum_{i=1}^n Y_i^2 - n\bar{Y} \right] = \sum_{i=1}^n E(Y_i^2) - n E(\bar{Y}^2)$$

Where:

\begin{align}
E(Y_i)^2 &= V(Y_i) + [E(Y_i)]^2 = \sigma^2 + \mu^2 \\
E(\bar{Y}^2) &= V(\bar{Y}) + [E(\bar{Y})]^2 = \frac{\sigma^2}{n} + \mu^2 \\
\end{align}

$$E\left[ \sum_{i=1}^n Y_i^2 - n\bar{Y} \right] = \sum_{i=1}^n E(Y_i^2) - n E(\bar{Y}^2) = n(\sigma^2 + \mu^2) - n \left( \frac{\sigma^2}{n} + \mu^2 \right) = (n-1) \sigma^2 $$

Therefore,

$$E(s^2) = E\left[ \frac{1}{n-1} (n-1) \sigma^2 \right] = E(\sigma^2) = \sigma^2$$

$$E(s'^2) = E\left[ \frac{1}{n} (n-1) \sigma^2 \right] = E(\sigma^2 - \frac{\sigma^2}{n}) = \sigma^2 - \frac{\sigma^2}{n}$$

Hence, $s^2$ is an unbiased estimator of $\sigma^2$.

## 12.2 The Method of Moments (MOM)
<hr>

One of the oldest methods for deriving point estimators is the method of moments. Recall that the $k$th moment of a random variable, taken about the origin, is:

$$u'_k = E(Y^k)$$

The corresponding $k$th sample moment is the average:

$$m'_k = \frac{1}{n} \sum_{i=1}^n Y_i^k$$

Sample moments should provide good estimates of the corresponding population moments. That is, $m'_k$ should be a good estimator of $\mu'_k$. Then, because the population moments are functions of the population parameters, we can equate corresponding population and sample moments and solve for the desired estimators.

***Choose as estimates those values of the parameters that are solutions of the equations $μ'_k=m'_k$, for $k=1,2,\cdots,t$ where $t$ is the number of parameters to be estimated.***

<br>

**Example:** Let $Y_1, Y_2, \cdots, Y_n$ denote a random sample from the pdf:

$$f(y \mid \theta) = \begin{cases}
(\theta+1)y^\theta & 0<y<1; \theta>-1 \\
0 & \text{elsewhere}\\
\end{cases}
$$

Find an estimator for $\theta$ by the method of moments. Show that the estimator is consistent.

Since there is onlyh one parameter $\theta$, we find the first moment for $t=1$ as:

$$\mu'_1 = E(Y) \quad \quad m'_1 = \frac{1}{n} \sum_{i=1}^n Y_i = \bar{Y}$$

Setting $\mu'_1$ equal to $m'_1$:

$$E(Y) = \bar{Y}$$

Finding $E(Y)$ as:

$$E(Y) = \int_{0}^1 y(\theta+1)y^\theta dy = (\theta + 1) \int_{0}^1 y^{\theta+1} dy = \frac{\theta+1}{\theta+2}$$

Therefore,

$$\frac{\theta+1}{\theta+2} = \bar{Y}$$

Solving for $\theta$:

$$\hat{\theta}_{\text{MOM}} = \frac{2 \bar{Y} - 1}{1 - \bar{Y}}$$

<br>

**Example:** If $Y_1,Y_2, \cdots ,Y_n$ denote a random sample from the normal distribution with known mean $\mu=0$ and unknown variance $\sigma^2$, find the method-of-moments estimator of $\sigma^2$.

$$Y_i \text{(iid)} \sim N(0,\sigma^2)$$

Since one parameter $\mu=0$ is already known, we need to find only one equation. For a two-parameter random variable such as normal, we need to find the second moment, and set it equal to the sample second moment.

$$\mu'_2 = E(Y^2), \quad \quad m'_2 = \frac{1}{n} \sum_{i=1}^n Y_i^2$$

$$E(Y^2) = V(Y) + [E(Y)]^2 = \sigma^2 + 0^2 = \sigma^2$$

Therefore,

$$\hat{\sigma}^2_{\text{MOM}} = m'_2 = \frac{1}{n} \sum_{i=1}^n Y_i^2$$

Similarly,

\begin{align}
\hat{\beta}_ {\text{MOM}} &= \bar{Y} & \text{(exponential)} \\
\hat{p}_ {\text{MOM}} &= \frac{1}{\bar{Y}} & \text{(geometric)} \\
\hat{\lambda}_ {\text{MOM}} &= \bar{Y} & \text{(poisson)} \\
\end{align}

For a gamma random variable, since it is a two-parameter random variable, we need two equations and the 2nd moment.

$$E(Y)=\alpha \beta, \quad V(Y)=\alpha \beta^2, \quad E(Y^2)=\alpha \beta^2 + (\alpha \beta)^2$$

$$\mu'_1 = m'_1 \quad \rightarrow E(Y)=\bar{Y} \quad \rightarrow \alpha \beta = \bar{Y} \quad \rightarrow \hat{\alpha}_{\text{MOM}} = \frac{\bar{Y}}{\beta}$$

$$\mu'_2 = m'_2 \quad \rightarrow \alpha \beta^2 + (\alpha \beta)^2 = \frac{1}{n} \sum_{i=1}^n Y_i^2$$

Plugging in the value of $\hat{\alpha}_{\text{MOM}}$ and simplifying:

$$\hat{\beta}_{\text{MOM}} = \frac{\sum_{i=1}^n Y_i^2 - \bar{Y}^2}{n \bar{Y}}$$

Plugging this value back into $\hat{\alpha}_{\text{MOM}}$:

$$\hat{\alpha}_{\text{MOM}} = \frac{n \bar{Y}^2}{\sum_{i=1}^n (Y_i - \bar{Y})^2}$$

## 12.3 Maximum Likelihood Estimation (MLE)
<hr>

The method of maximum likelihood selects as estimate(s) of the parameter(s) those which maximize the likelihood of the observed sample.

What is a likelihood function?

- $y_1,y_2, \cdots ,y_n$ are sample observations taken on random variables $Y_1,Y_2, \cdots ,Y_n$
- The distribution of $Y_i$ depends on a parameter $\theta$
- Notation: $L(y_1,y_2,\cdots,y_n \mid \theta)$ represents the likelihood of the sample given $\theta$

\begin{align}
L(y_1, y_2, \cdots, y_n \mid \theta) &= P(y_1, \cdots, y_n \mid \theta) = \prod_{i=1}^n P(y_i \mid \theta) & \text{(discrete)} \\
L(y_1, y_2, \cdots, y_n \mid \theta) &= f(y_1, \cdots, y_n \mid \theta) = \prod_{i=1}^n f(y_i \mid \theta) & \text{(continuous)} \\
\end{align}

<br>

**Example:** Let $Y_1, Y_2, \cdots, Y_n$ denote a random sample from the pdf:

$$f(y \mid \theta) = \begin{cases}
(\theta+1)y^\theta & 0<y<1; \theta>-1 \\
0 & \text{elsewhere}\\
\end{cases}
$$

**Question:** Find the MLE for $\theta$.

\begin{align}
L(y_1, \cdots, y_n \mid \theta) &= \prod_{i=1}^n f(y_i \mid \theta) = \prod_{i=1}^n \left[ (\theta+1)y^\theta \right] = (\theta+1)^n (y_1 y_2 \cdots y_n)^\theta \\
\\
\mathcal{L}(y_1, \cdots, y_n \mid \theta) &= \ln \left[ (\theta+1)^n (y_1 y_2 \cdots y_n)^\theta \right] = \ln{(\theta+1)^n} + \ln{(y_1 y_2 \cdots y_n)^\theta} = n \ln{(\theta+1)} + \theta \ln{(y_1 y_2 \cdots y_n)} = n \ln{(\theta+1)} + \theta \sum_{i=1}^n \ln{(y_i)} \\ 
\end{align}

*Taking the derivative of the log-likelihood and setting it equal to zero (maximization):*

\begin{align}
\frac{d \mathcal{L}}{d\theta} &= n \left( \frac{1}{\theta+1} \right) + \sum_{i=1}^n \ln{(y_i)} = 0 \\
n \left( \frac{1}{\theta+1} \right) &= - \sum_{i=1}^n \ln{(y_i)} \\
\theta+1 &= -\frac{n}{\sum_{i=1}^n \ln{(y_i)}} \\
\\
\hat{\theta}_ {\text{MLE}} &= -\frac{n}{\sum_{i=1}^n \ln{(y_i)}} - 1
\end{align}