# Question 8

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

Define some target function $f(x)$ where $x \in \mathcal{X}$ and $\mathcal{X}$ is some input space. Assume that the function $f$ involves uncertainty such that

\begin{align*}
    y \sim f(x)
\end{align*}

where $y$ is an observation associated with an input $x$. Define the expected value of $f(x)$ under the distribution of $f$ as

\begin{align*}
    \mu = \mathbb{E}[f(x)] = \int f(x) p(f) df
\end{align*}

Without making any assumptions on the space of functions $f$, specifically in terms of how the noise term interacts with the deterministic parts of $f$, e.g. additive or multiplicative noise, define an estimator for $f(x)$ as

\begin{align*}
    \hat{f}(x)
\end{align*}

The estimator is unbiased if its expected value is the same as the expected value of $f$ such that

\begin{align*}
    \mathbb{E}[\hat{f}(x)] = \int \hat{f}(x) p(\hat{f}) d\hat{f}
\end{align*}

Notice that an estimator can be unbiased even in the case $p(f) \neq p(\hat{f})$. Hence the estimated distribution is allowed to differ from the true distribution but it can be approximated using moment matching.

The consistency of an estimator is a stronger condition on the form of $p(\hat{f})$. Specifically, if $p(\hat{f})$ is consistent then as the number of samples $N \to \infty$ 

\begin{align*}
    \hat{f}(x) \to f(x)
\end{align*}

where $\to$ denotes convergence in distribution.

## Example 1

Unbiased but not consistent estimator.

### Sample variance estimator

Let $ X_1, X_2, \dots, X_n $ be independent and identically distributed (i.i.d.) random variables with mean $ \mu = \mathbb{E}[X_i] $ and variance $ \sigma^2 = \mathbb{E}[(X_i - \mu)^2] $.

The *sample mean* is given by:

\begin{align*}
    \bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i
\end{align*}

The *sample variance* estimator $ S_n^2 $, scaled by $ \frac{1}{n} $, is defined as:

\begin{align*}
    S_n^2 = \frac{1}{n} \sum_{i=1}^{n} (X_i - \bar{X})^2
\end{align*}

This estimator differs from the commonly used unbiased estimator, which is scaled by $ \frac{1}{n-1} $. We will examine the bias and consistency of $ S_n^2 $.

#### Unbiasedness of $ S_n^2 $

To check if $ S_n^2 $ is unbiased, we compute its expected value and compare it to the population variance $ \sigma^2 $. Expanding the squared term in $ S_n^2 $:

\begin{align*}
    S_n^2 = \frac{1}{n} \sum_{i=1}^{n} \left[ (X_i - \mu) - (\bar{X} - \mu) \right]^2
\end{align*}    

Expanding the square:

\begin{align*}
    S_n^2 = \frac{1}{n} \sum_{i=1}^{n} \left[ (X_i - \mu)^2 - 2(X_i - \mu)(\bar{X} - \mu) + (\bar{X} - \mu)^2 \right]
\end{align*}  

Taking the expectation of $ S_n^2 $:

First term:

\begin{align*}
    \mathbb{E} \left[ \frac{1}{n} \sum_{i=1}^{n} (X_i - \mu)^2 \right] = \frac{1}{n} \sum_{i=1}^{n} \mathbb{E}[(X_i - \mu)^2] = \frac{1}{n} \cdot n \sigma^2 = \sigma^2
\end{align*}

Second term:

The cross term $ \mathbb{E}[(X_i - \mu)(\bar{X} - \mu)] $ is zero because $ \mathbb{E}[\bar{X}] = \mu $, and $ X_i $ is independent of $ \bar{X} $:

\begin{align*}
    \mathbb{E}[(X_i - \mu)(\bar{X} - \mu)] = 0
\end{align*}

Third term:

The expectation of $ (\bar{X} - \mu)^2 $ is the variance of the sample mean:

\begin{align*}
    \mathbb{E}[(\bar{X} - \mu)^2] = \frac{\sigma^2}{n}
\end{align*}

Thus, the expectation of $ S_n^2 $ becomes:

\begin{align*}
    \mathbb{E}[S_n^2] = \sigma^2 - \frac{\sigma^2}{n}
\end{align*}

This shows that $ S_n^2 $ is *biased*, with the bias term $ \frac{\sigma^2}{n} $. However, as $ n \to \infty $, the bias vanishes, and $ S_n^2 $ becomes asymptotically unbiased.

#### Corrected Estimator and Unbiasedness

The unbiased version of the sample variance is scaled by $ \frac{1}{n-1} $:

\begin{align*}
    S_{n,\text{unbiased}}^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2
\end{align*}

This corrected version is unbiased for all $ n $:

\begin{align*}
    \mathbb{E}[S_{n,\text{unbiased}}^2] = \sigma^2
\end{align*}

#### Consistency of $ S_n^2 $

To check for consistency, we require that for any $ \epsilon > 0 $,

\begin{align*}
    \lim_{n \to \infty} \mathbb{P}(|S_n^2 - \sigma^2| > \epsilon) = 0
\end{align*}

Let’s compute the variance of $ S_n^2 $. Using the expression:

\begin{align*}
    S_n^2 = \frac{1}{n} \sum_{i=1}^{n} (X_i - \mu)^2 - (\bar{X} - \mu)^2
\end{align*}

The variance of $ S_n^2 $ is:

\begin{align*}
    \text{Var}(S_n^2) = \mathbb{E}[(S_n^2 - \sigma^2)^2]
\end{align*}

This variance can be shown to be:

\begin{align*}
    \text{Var}(S_n^2) = \frac{2\sigma^4}{n}
\end{align*}

As $ n \to \infty $, the variance tends to zero, meaning that $ S_n^2 $ becomes concentrated around its expectation. However, since $ \mathbb{E}[S_n^2] \neq \sigma^2 $ for finite $ n $, $ S_n^2 $ does not converge to $ \sigma^2 $, making it *inconsistent*.

To check for the consistency of the unbiased sample variance estimator, recall that the unbiased estimator is:

\begin{align*}
    S_{n,\text{unbiased}}^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2
\end{align*}

The estimator $S_{n,\text{unbiased}}^2$ is consistent if:

\begin{align*}
    \lim_{n \to \infty} \mathbb{P}(|S_{n,\text{unbiased}}^2 - \sigma^2| > \epsilon) = 0 \quad \text{for any} \quad \epsilon > 0
\end{align*}

This condition means that as the sample size $n$ increases, the probability that $S_{n,\text{unbiased}}^2$ deviates significantly from $\sigma^2$ should approach zero. To check this, we need to compute the variance of $S_{n,\text{unbiased}}^2$.

Using standard results for the sample variance, the variance of $S_{n,\text{unbiased}}^2$ is:

\begin{align*}
    \text{Var}(S_{n,\text{unbiased}}^2) = \frac{2\sigma^4}{n-1}
\end{align*}

As $n \to \infty$, we observe that:

\begin{align*}
    \lim_{n \to \infty} \text{Var}(S_{n,\text{unbiased}}^2) = 0
\end{align*}

This implies that $S_{n,\text{unbiased}}^2$ becomes concentrated around its expectation, $\sigma^2$, as $n$ grows. Since the variance of $S_{n,\text{unbiased}}^2$ tends to zero and the estimator is unbiased (i.e., $\mathbb{E}[S_{n,\text{unbiased}}^2] = \sigma^2$), $S_{n,\text{unbiased}}^2$ is a consistent estimator of $\sigma^2$.

Therefore, the unbiased sample variance estimator $S_{n,\text{unbiased}}^2$ is consistent.


#### Conclusions

The sample variance estimator with the $ \frac{1}{n} $ scaling factor is:


- *Unbiased* in the limit: $ \mathbb{E}[S_n^2] \to \sigma^2 $ as $ n \to \infty $, but biased for finite $ n $.
- *Inconsistent*: It does not converge to $ \sigma^2 $ for large $ n $, as its expected value remains biased for finite $ n $.





## Example 2

Biased but consistent estimator.

Consider a uniform population with an unknown upper bound [(original example)](https://math.stackexchange.com/questions/150586/expected-value-of-max-of-uniform-iid-variables)

\begin{align*}
    X \sim \mathcal{U}((0, \theta))
\end{align*}

A simple estimator of $\theta$ is the sample maximum

\begin{align*}
    \hat{\theta} = \max \{ X_1, X_2, \cdots, X_n \}
\end{align*}

Suppose we have $n$ independent and identically distributed (IID) random variables $\{ X_1, X_2, \dots, X_n \}$, each following a uniform distribution over the interval $(0, \theta)$. We want to find the expected value of their maximum, denoted by $M_n = \max(X_1, X_2, \dots, X_n) $.

### Step 1: CDF of the Maximum

The cumulative distribution function (CDF) of $ M_n $, denoted $ F_{M_n}(x) $, gives the probability that $ M_n $ is less than or equal to some value $ x $:

\begin{align*}
    F_{M_n}(x) = P(M_n \leq x).
\end{align*}

Since $ M_n $ is the maximum of $ X_1, X_2, \dots, X_n $, for $ M_n \leq x $, it must be true that all the individual $ X_i $'s are less than or equal to $ x $. Thus, we can express this as:

\begin{align*}
    F_{M_n}(x) = P(X_1 \leq x \text{ and } X_2 \leq x \text{ and } \dots \text{ and } X_n \leq x).
\end{align*}

Because the variables $ X_1, X_2, \dots, X_n $ are independent, the joint probability of all of them being less than or equal to $ x $ is the product of their individual probabilities:

\begin{align*}
    F_{M_n}(x) = P(X_1 \leq x) \cdot P(X_2 \leq x) \cdot \dots \cdot P(X_n \leq x).
\end{align*}

Since the $ X_i $'s are identically distributed, $ P(X_i \leq x) $ is the same for all $ i $. For a uniform distribution over $ [0, \theta] $, the probability that $ X_i \leq x $ is simply $ x / \theta$ (because the CDF of a uniform random variable $ X \sim U(0, \theta) $ is $ F_X(x) = x / \theta $ for $ 0 \leq x \leq \theta $).

Therefore, we get:

\begin{align*}
    F_{M_n}(x) = (P(X_1 \leq x))^n = \left( \frac{x}{\theta}\right)^n, \quad \text{for } 0 \leq x \leq \theta.
\end{align*}

### Step 2: PDF of the Maximum

To find the probability density function (PDF) of $ M_n $, we differentiate the CDF:

\begin{align*}
    f_{M_n}(x) = \frac{d}{dx} F_{M_n}(x) = \frac{d}{dx} \left( \frac{x}{\theta}\right)^n = \frac{n}{\theta} \left( \frac{x}{\theta}\right)^{n-1}, \quad \text{for } 0 \leq x \leq \theta.
\end{align*}

### Step 3: Expected Value of the Maximum

To compute the expected value of the maximum, we use the definition of the expected value:

\begin{align*}
    E[M_n] = \int_0^{\theta} x f_{M_n}(x) \, dx.
\end{align*}

Substitute the PDF $ f_{M_n}(x) = n \left( \frac{x}{\theta}\right)^{n-1} $:

\begin{align*}
    E[M_n] = \int_0^{\theta} x \cdot \frac{n}{\theta} \left( \frac{x}{\theta}\right)^{n-1} \, dx = \frac{n}{\theta^{n}} \int_0^{\theta} x^n \, dx.
\end{align*}

The integral of $ x^n $ over $ [0, \theta] $ is $ \frac{\theta^{n+1}}{n+1} $, so:

\begin{align*}
    E[M_n] = \frac{n}{\theta^{n}} \cdot \frac{\theta^{n+1}}{n+1} = \frac{n}{n+1} \theta.
\end{align*}

Hence the expected value of the maximum of $ n $ IID random variables uniformly distributed over $ [0, \theta] $ is:

\begin{align*}
    E[M_n] = \frac{n}{n+1} \theta.
\end{align*}

We have shown that the estimator $M_n$ is a biased estimator of $\theta$. However, it is straightforward to show that $M_n$ is a consistent estimator since 

\begin{align*}
    E[ \lim_{n \to \infty} M_n] = \lim_{n \to \infty} \frac{n}{n+1} \theta = \theta.
\end{align*}
