# Some statistical terminology

## Statistic
Let $X_1, X_2,\ldots,X_n$ be a random sample from a population.

A <font color="green"><b>statistic</b></font> is:
* a function of the sample data
* does not depend on any unknown parameters

A few examples of `statistics`:
- Sample mean    : $\overline{X}= \displaystyle \frac{1}{n} \sum_{i=1}^n X_i$
- Sample variance: $S^2 = \displaystyle \frac{1}{n-1}\sum_{i=1}^n (X_i -\overline{X})^2$
- Sample max     : $\max(X_1,\ldots, X_n)$
- Sample range   : $\max(X_1,\ldots, X_n) - \min(X_1,\ldots, X_n)$

## Estimator
Let $X_1, X_2,\ldots,X_n$ be a random sample from a population with an unknown parameter $\theta$.<br>
The statistic $\widehat{\theta} = T(X_1, X_2,\ldots,X_n)$ is an <font color="green"><b>estimator</b></font> of $\theta$.

Examples:
* $\overline{X}= \displaystyle \frac{1}{n} \sum_{i=1}^n X_i$ is an estimator of $\mu$ (population mean)
* $S^2 = \displaystyle \frac{1}{n-1}\sum_{i=1}^n (X_i -\overline{X})^2$ is an estimator of the population variance $\sigma^2$

## Bias of an estimator
The <font color="green"><b>bias of an estimator</b></font> is a measure for the distance between<br> the expected value of the 
estimator and the value of the parameter is it is estimating.

Let $\widehat{\theta}$ be an estimator of the population parameter $\theta$.
Then the bias of $\widehat{\theta}$ is defined as:<br>
$\begin{eqnarray}
   \texttt{Bias}(\widehat{\theta}) & := & \mathbb{E}[\widehat{\theta}] - \theta
 \end{eqnarray}$
 
If the $\texttt{Bias}(\widehat{\theta})=0$, we have an **unbiased** estimator.<br>
Examples:<br>
* $\begin{eqnarray}
  \mathbb{E}[\overline{X}] & = & \displaystyle \frac{1}{n} \sum_{i=1}^n \mathbb{E}[X_i] \\
                           & = & \mu
  \end{eqnarray}$
  
* $\begin{eqnarray}
   \mathbb{E}[S^2] & = &  \displaystyle \mathbb{E}\Bigg [\frac{1}{n-1}\sum_{i=1}^n (X_i -\overline{X})^2 \Bigg] \\
                   & = &  \displaystyle \frac{1}{n-1} \mathbb{E}\Bigg [\sum_{i=1}^n X_i^2 -n \overline{X})^2 \Bigg] \\
                   & = &  \displaystyle \frac{1}{n-1} \Big\{ \sum_{i=1}^n \mathbb{E}[X_i^2] - n\big(\frac{\sigma^2}{n} + \mu^2 \big)  \Big\} \\
                   & = &   \displaystyle \frac{1}{n-1} \Big\{ n(\mu^2 + \sigma^2)  - n\big(\frac{\sigma^2}{n} + \mu^2 \big)  \Big\} \\
                   & = & \sigma^2
  \end{eqnarray}$

The estimators $\overline{X}$ and $S^2$ are **unbiased**.

## Variance of an estimator
The <font color="green"><b>variance of an estimator</b></font> is a measure for the spread<br>
of the estimator's sampling distribution.

Let $\widehat{\theta}$ be an estimator of the population parameter $\theta$.<br>
Then the variance of estimator $\widehat{\theta}$ is defined as:<br>
$\begin{eqnarray}
   \mathbb{V}[\widehat{\theta}] & = & \displaystyle \mathbb{E}\Big[(\widehat{\theta} - \mathbb{E}[\widehat{\theta}])^2 \Big] 
 \end{eqnarray}$

Examples:<br>
* $\begin{eqnarray}
  \mathbb{V}[\overline{X}] & = & \displaystyle \mathbb{V} \Bigg [ \frac{1}{n} \sum_{i=1}^n X_i \Bigg] \\
                           & = & \frac{\sigma^2}{n}
  \end{eqnarray}$
  
* $\begin{eqnarray}
   \mathbb{V}[S^2] & = &  \displaystyle \frac{1}{n} \Bigg ( \mu_4 - \frac{n-3}{n-1}\sigma^4 \Bigg)
  \end{eqnarray}$

  where $\mu_4 := \mathbb{E}[(X_i - \mu)^4]$.