# STATISTICAL PROPERTIES
<br>


## Introduction

<br>
Depending on the estimator we choose to use, and (within that specific estimator) on the assumptions satisfied, our estimates will show a different behavior on a number of statistical properties, which we will discuss shortly. 

<br>
These statistical properties are extremely important because they provide criteria for choosing among alternative estimators. Knowledge of these properties is therefore essential to understand why we use a certain estimation procedure under certain conditions.

<br>
There are two categories of statistical properties : <br>

<ul style="list-style-type:square">
    <li>
        <b>finite-sample</b> (or small-sample) <b>properties</b>; the most desirable finite-sample properties of an estimator
        are unbiasedness, minimum variance, and efficiency <br>
        [<b>F1 - F3</b>]
    </li>
    <br>
    <li>
        <b>asymptotic</b> (or large-sample) <b>properties</b>; the most desirable asymptotic property of an estimator is
        consistency <br>
        [<b>A1</b>]
    </li>
</ul>

<br>
Both sets of statistical properties refer to the properties of the sampling distribution (or probability distribution) of the estimator $\hat{\boldsymbol{\beta}}$ for different sample sizes. 

## Finite-Sample Properties

<br>
The finite-sample properties of the estimator refer to the properties of the sampling distribution of $\hat{\boldsymbol{\beta}}$ for any sample of fixed size N, where N is a finite number denoting the number of observations in the sample. In fact, there is a family of finite-sample distributions for the estimator, one for each finite value of N.

<br>
The sampling distribution of $\hat{\boldsymbol{\beta}}$ is based on the concept of repeated sampling : <br>

<ul style="list-style-type:square">
    <li>
        suppose a large number of samples of size N are randomly selected from some underlying population; each of these samples
        contains N observations and (in general) different sample values of the observable random variables
    </li>
    <br>
    <li>
        for each of these samples of N observations, the estimator for $\hat{\boldsymbol{\beta}}$ is used to compute a numerical
        estimate of the unknown population parameter $\boldsymbol{\beta}$, and each sample yields a different numerical estimate 
    </li>
    <br>
    <li>
        if we tabulate or plot these different sample estimates of the parameter $\boldsymbol{\beta}$ for a very large number of
        samples of size N, we obtain the finite-sample distribution of the estimator $\hat{\boldsymbol{\beta}}$
    </li>
</ul>

<br>
The finite-sample distribution of the estimator $\hat{\boldsymbol{\beta}}$ for any finite sample size N has two characteristic values : a mean (or expectation) and a variance, respectively denoted as $\mathbf{E}(\hat{\boldsymbol{\beta}})$ and $\mathrm{Var}(\hat{\boldsymbol{\beta}})$. It's in terms of these two values (mean and variance) of the finite-sample distribution of the estimator $\hat{\boldsymbol{\beta}}$ that we define the finite-sample properties of the estimator itself.


### [F1] Unbiasedness

<br>
The estimator $\hat{\boldsymbol{\beta}}$ is said to be an <b>unbiased</b> estimator of the corresponding population parameter if the mean (or expectation) of the finite-sample distribution of $\hat{\boldsymbol{\beta}}$ is equal to the true value $\boldsymbol{\beta}$ : <br>

$ 
    \quad 
    \mathbf{E}(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta} \quad \text{for any given finite sample size } 0 < N < \infty 
$

<br>
The <b>bias</b> of the estimator $\hat{\boldsymbol{\beta}}$ is defined as $ \Big[ \mathbf{E}(\hat{\boldsymbol{\beta}}) - \boldsymbol{\beta} \Big] $; the estimator $\hat{\boldsymbol{\beta}}$ is said to be : <br>

<ul style="list-style-type:square">
    <li>
        <b>upward biased</b> (or positively biased) if the bias is greater than zero 
        $ \quad \Big( \mathbf{E}[\hat{\boldsymbol{\beta}}] - \boldsymbol{\beta} > 0 \Big) $
    </li>
    <br>
    <li>
        <b>downward biased</b> (or negatively biased) if the bias is less than zero 
        $ \quad \Big( \mathbf{E}[\hat{\boldsymbol{\beta}}] - \boldsymbol{\beta} < 0 \Big) $
    </li>
</ul>

<br>
If the estimator is unbiased, it means that on average it's correct, even though any single estimate for a particular sample of data may not equal $\boldsymbol{\beta}$. More technically, it means that the finite-sample distribution of the
estimator is centered on the value $\boldsymbol{\beta}$, rather than some other real value. 

<br>
The bias of an estimator is an inverse measure of its average accuracy; the smaller in absolute value is the bias, the more accurate on average is the estimator. 

### [F2] Minimum Variance

<br>
The estimator $\hat{\boldsymbol{\beta}}$ is said to be a minimum-variance estimator of the corresponding population parameter if the variance of the finite-sample distribution of $\hat{\boldsymbol{\beta}}$ is less than or equal to the variance of the finite-sample distribution of $\tilde{\boldsymbol{\beta}}$, where the latter is any other estimator of the same population parameter :

$
    \quad 
    \mathrm{Var}(\hat{\boldsymbol{\beta}}) \ \leq \ \mathrm{Var}(\tilde{\boldsymbol{\beta}})
    \quad \text{for any given finite sample size } 0 < m < \infty 
$

<br>
The variance of an estimator is an inverse measure of its statistical precision, i.e. of its dispersion around the mean. The smaller the variance of an estimator, the more statistically precise it is. 

<br>
It's essential to notice that the statistical property of minimum variance implies nothing about the estimators being biased or
unbiased. 

<br>
A minimum variance estimator is therefore the statistically most precise estimator of an unknown population parameter, although it may be biased or unbiased. 


### [F3] Efficiency

<br>
The finite-sample property of efficiency is defined only for unbiased estimators. Equivalently, a necessary condition for efficiency of the estimator $\hat{\boldsymbol{\beta}}$ is that it must be an unbiased estimator of the corresponding population parameter.

<br>
Let $\hat{\boldsymbol{\beta}}$ and $\tilde{\boldsymbol{\beta}}$ be two unbiased estimators of the same population parameter $\boldsymbol{\beta}$, then the estimator $\hat{\boldsymbol{\beta}}$ is efficient relative to the estimator $\tilde{\boldsymbol{\beta}}$ if the variance of the finite-sample distribution of the former is less than or at most equal to the variance of the finite-sample distribution of the latter.

<br>
<b>Efficiency = Unbiasedness + Minimum Variance</b>

<br>
Efficiency is a desirable statistical property because it provides a criterion for choosing among a number of unbiased estimators the one showing the minimum dispersion around its mean, the most precise.

## Asymptotic Properties

<br>
The asymptotic (or large-sample) properties of an estimator $\hat{\boldsymbol{\beta}}$ refer to the properties of the sampling distribution of that estimator as the sample size N becomes very (or indefinitely) large, as N approaches infinity.

<br>
The sampling distribution of an estimator $\hat{\boldsymbol{\beta}}$ differs for different sample sizes; in general, the sampling distributions of $\hat{\boldsymbol{\beta}}_{\boldsymbol{m_1}}$ and $\hat{\boldsymbol{\beta}}_{\boldsymbol{m_2}}$ will be different in terms of means, variances, mathematical forms.

<br>
The most desirable asymptotic properties of an estimator are : <br>

<ul style="list-style-type:square">
    <li>
        <b>Consistency</b>
    </li>
    <br>
    <li>
        <b>Asymptotic Unbiasedness</b>
    </li>
    <br>
    <li>
        <b>Asymptotic Efficiency</b>
    </li>
</ul>

<br>
Before we dive into the details of these large-sample properties, we first need to distinguish between two sampling distributions of the estimator: the <b>asymptotic distribution</b> and the <b>ultimate (or final) distribution</b>.


### Asymptotic and Ultimate distributions

<br>
The <b>asymptotic</b> distribution of an estimator is the distribution to which the sampling distribution of the estimator finally converges as sample size m approaches infinity.

<br>
For many estimators, the sampling distribution collapses to a single point as sample size m approaches infinity: <br>

<ul style="list-style-type:square">
    <li>
        more specifically, as sample size $m \rightarrow \infty$, the sampling distribution of the estimator collapses to
        a column of unit probability mass (or unit probability density) at a single point on the real line
    </li>
    <br>
    <li>
        such an estimator is said to converge in probability to some value
    </li>
    <br>
    <li>
        a distribution that is completely concentrated at one point (on the real line) is called a degenerate distribution.
        Graphically, a degenerate distribution is represented by a perpendicular to the real line with height equal to one 
    </li>
</ul>

<br>
The <b>ultimate</b> distribution of an estimator is the distribution to which the sampling distribution of the estimator converges as sample size m "reaches" infinity. 

<br>
We can now define the asymptotic distribution of an estimator as the distribution to which the sampling distribution of the estimator converges just before it collapses (if it does) as sample size m approaches infinity. If the ultimate distribution of an estimator is degenerate, then its asymptotic distribution is not identical to its ultimate distribution. If the ultimate distribution of an estimator is non-degenerate, then its asymptotic distribution is identical to its ultimate distribution. 

### [A1] Consistency

<br>
The estimator $\hat{\boldsymbol{\beta}}$ is a consistent estimator of the population parameter $\boldsymbol{\beta}$ if its sampling distribution converges to (or collapses on) the value of the population parameter as $m \rightarrow \infty$ : <br>

$
    \quad
    \begin{align}
        \quad
        & \quad p\lim_{m \rightarrow \infty}
          \hat{\boldsymbol{\beta}} = \boldsymbol{\beta}
        & \text{the estimator }
          \hat{\boldsymbol{\beta}} 
          \text{ converges in probability to the population parameter } 
          \boldsymbol{\beta}
        \newline
        \text{or} & \quad p\lim_{m \rightarrow \infty} 
                    Pr \Big( \lvert \hat{\boldsymbol{\beta}} - \boldsymbol{\beta} \rvert \Big) = 1 
        & \text{the probability that }
          \hat{\boldsymbol{\beta}} 
          \text{ is arbitrarily close to }
          \boldsymbol{\beta} 
          \text{ approaches 1 as the sample size }
          m \rightarrow \infty
    \end{align}
$

<br>
It means that, as the sample size $m$ becomes larger and larger (approches infinity) : <br>

<ul style="list-style-type:square">
    <li>
         the sampling distribution of the estimator $\hat{\boldsymbol{\beta}}$ becomes more and more concentrated around
         $\boldsymbol{\beta} $
    </li>
    <br>
    <li>
         the value returned by the estimator (the estimate) is more and more likely to be very close to $\boldsymbol{\beta}$
    </li>
</ul>

<br>
Definition of the probability limit and more details in the appendix <b>AX1</b>.


#### [A1] Consistency - A Necessary Condition

<br>
Let $\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}$ be an estimator of the population parameter $\boldsymbol{\beta}$ based on a sample of size $m$ observations. 

<br>
A necessary condition for consistency of the estimator is for the ultimate distribution of $\hat{\boldsymbol{\beta}}$ to be a degenerate distribution at some point on the real line, i.e. to converge to a single point on the real line. 


#### [A1] Consistency - A Sufficient Condition

<br>
Let $\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}$ be an estimator of the population parameter $\boldsymbol{\beta}$ based on a sample of size $m$ observations. 

<br>
If both the bias and variance of the estimator $\hat{\boldsymbol{\beta}}_{\boldsymbol{N}}$ approach zero as the sample size $m$ approaches infinity, then $\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}$ is a consistent estimator of $\boldsymbol{\beta}$ : <br>

$
    \quad
    \lim_{m \rightarrow \infty} \text{Bias}(\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}) = 0
    \quad \text{or} \quad
    \lim_{m \rightarrow \infty} \mathbf{E}(\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}) = \boldsymbol{\beta}
    \quad \textbf{and} \quad
    \lim_{m \rightarrow \infty} \mathrm{Var}(\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}) = 0    
$


## Appendix
<br>


### [AX-1] Probability Limit

<br>
Let $\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}$ be an estimator of the population parameter $\boldsymbol{\beta}$ with a degenerate ultimate distribution, i.e. the sampling distribution of $\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}$ collapses to a column of unit density as the sample size approaches infinity.

<br>
The point $\boldsymbol{\beta_0}$ on which the ultimate sampling distribution converges is called the <b>probability limit</b> of $\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}$ and is denoted as 
$ p\lim \hat{\boldsymbol{\beta}}_{\boldsymbol{m}} $ or 
$ p\lim_{m \rightarrow \infty} \hat{\boldsymbol{\beta}}_{\boldsymbol{m}} $ :

$
    \quad
    \begin{align*}
        &
        \begin{aligned}[T]
            \lim_{m \rightarrow \infty} Pr
            \Big(
                \boldsymbol{\beta_0} - \boldsymbol{\varepsilon} \leq
                \hat{\boldsymbol{\beta}}_{\boldsymbol{m}} \leq
                \boldsymbol{\beta_0} + \boldsymbol{\varepsilon}
            \Big) &=
            \newline
            &= \lim_{m \rightarrow \infty} Pr
            \Big(
                - \boldsymbol{\varepsilon} \leq
                \hat{\boldsymbol{\beta}}_{\boldsymbol{m}} - \boldsymbol{\beta_0} \leq
                + \boldsymbol{\varepsilon}
            \Big)
            \newline
            &= \lim_{N \rightarrow \infty} Pr
            \Big( 
                \lvert \hat{\boldsymbol{\beta}}_{\boldsymbol{m}} - \boldsymbol{\beta_0} \rvert \leq \boldsymbol{\varepsilon}
            \Big)
            \newline
            &= 1
        \end{aligned}
    \end{align*}
$

where $\boldsymbol{\varepsilon}$  is an arbitrarily small positive number.

<br>
As sample size $m \rightarrow \infty$, the estimator $\hat{\boldsymbol{\beta}}_{\boldsymbol{m}}$ "converges in probability" to the point $\boldsymbol{\beta_0}$.


## References

<br>
<ul style="list-style-type:square">
    <li>
        Queen's University at Kingston - Economics 351 - M.G. Abbott -
        <a href="https://bit.ly/2kqeRUV">
        Desirable Statistical Properties of Estimators</a>        
    </li>
</ul>