# EBA3500 Lecture 11. Expectation, consistency, and the adjusted $R^2$

### Outline
1. Mention the problem with default values etc.
2. Expectation
    * What it means.
    * Small simulation to illustrate.
    * The definition of an unbiased estimator.
    * The betas in a regression are unbiased -- but most estimators are not ubiased.
    * The estimator of the variance, $S^2$ is also unbiased. 
    * Unbiasedness is NOT important in most cases, but we are dealing with an exception now.
2. Consistency of an estimator
    * What it means.
    * Small simulation to illustrate.
    * Mathematical definition.
    * Theorem: Variance going to 0 and unbiased implies consistency
2. Adjusted $R^2$:
    2. Why we have to adjust the $R^2$: A simulation example.
    3. Why we adjust the $R^2$: A real example
    4. The reasoning behind the adjustment. 

## Expectation
Recall the expectation operator, which is the theoretical mean of the random variable.

#### Linearity of expectation
Let $X_1, X_2$ be random variables and $a, b$ be numbers. Then
$E(aX_1 + bX_2) = aEX_1+bEX_2$.

This definition is quite general. A function is linear if it acts like this. For instance, matrix multiplication is linear. 

### Estimator
Let $\theta$ be some population value, e.g., and expectation or a regression coefficient. This value is typically unknown. An estimator $\hat{\theta}_n$ is a statistical measurement, based on observed data, of this population value. Whenever we say a population value, think about input to the data-generating process: These are the exact values the creator of a simulation study decides on.

In [None]:
#### Simulate regression model

# Estimator of \beta_0
# Estimator of \beta_1
# Not an estimator; it's a population value.
# Not an estimator, but a p-value.

#### Definition (Convergence in probability)
> An estimator $\hat{\theta}_{n}$ converges in probability to $\theta$ if $P(|\hat{\theta}_{n}-\theta|>\epsilon)\to0$ for all $\epsilon>0$ as $n\to\infty$.

#### Definition (Consistency)
> An estimator $\hat{\theta}_n$ is *consistent* for $\theta$ if it converges in probability to $\theta$.

Consistency roughly means that the histogram of an estimator will concentrate aribtrarily well around the true value when $n\to\infty$.
 

In [None]:
## Simulate from the normal distribution

## Simulate the median
## Simulate the mean


It appears that the median and mean are consistent for the $\mu$ parameter in the normal distribution. This is, in fact, true. 

In [None]:
## Simulate from the exponential distribution

## Simulate the median
## Simulate the mean

We conclude, informally, that the sample median isn't consistent for the mean of the exponential distribution. Do you understand why? 

This isn't a course in mathematics, and proving consistency is often quite difficult. It is important, however, to know what it means.

#### Definition (Unbiased estimator)
> An estimator is *unbiased* if $E(\hat{\theta}_n) = \theta$.

Most popular estimators are not unbiased, and it is not an important property in most scenarios. For instance, the estimated regression coefficients in a logistic regression are not unbiased. Neither are the estimated regression coefficients when using least absolute deviations. However, the sampled variance 
$S^2 = \frac{1}{n-1}\sum (X_i - \overline{X})^2$
is unbiased.

### Proposition
> Suppose the model conditions for the linear regression model holds true. Then the regression coefficients $\beta_i$ are unbiased, have variance converging to $0$, and are consistent.

That an estimator $\hat{\theta}$ is unbiased and has variance converging to $0$ actually implies that $\hat{\theta}$ is consistent.

### Proposition
> Suppose that $\hat{\theta}_n$ be unbiased for $\theta$, i.e., $E(\hat{\theta}_n) = \theta$. Moreover, suppose that the variance of $\hat{\theta}$ converges to $0$ as $n\to\infty$. Then $\hat{\theta}_n \to \theta$ in probability. In other words, $\hat{\theta}_n$ is consistent for $\theta$.

##### Proof
Let $\sigma_{n}^{2}=\textrm{Var}\hat{\theta}_{n}.$ By [Chebyshev's inequality](https://en.wikipedia.org/wiki/Chebyshev%27s_inequality),
$$
P(|\hat{\theta}_{n}-\theta|\geq\epsilon)\leq\frac{\sigma_{n}^{2}}{\epsilon^{2}}.
$$
Let $\epsilon$ be fixed. Since $\sigma_{n}^{2}\to0$ by assumption,
$\frac{\sigma_{n}^{2}}{\epsilon^{2}}\to0$ as well. Then, since $P(|\hat{\theta}_{n}-\theta|\geq\epsilon \leq\frac{\sigma_{n}^{2}}{\epsilon^{2}}$,
we find that $P(|\hat{\theta}_{n}-\theta|\geq\epsilon)$ too.

### Corollary: Law of large numbers
Assume $X_n$ is a sequence of identically distributed variables with common mean $\mu$ and finite variance $\sigma^2$. Let $\overline{X}_n$ denote the mean, $\overline{X}_n = n^{-1}\sum_{i=1}^n{X_i}$. Then $X_n\to\mu$.

#### Proof
Exercise.


## Adjusted $R^2$

### Constructing the adjusted $R^2$

## Summary
1. The expectation of a random variable $X$ is denoted by $E(X)$, and equals the theorotical mean of random variable $X$.
2. An estimator approximates a population value based on observed data.
3. An estimator is *consistent* if it approximates the population value arbitrarily well as $n\to \infty$.
4. An estimator $\hat{\theta}_n$ is *unbiased* if it equals $\theta$ in expectation, i.e., $E[\hat{\theta}_n]=\theta$.
5. Unbiased estimatation is not important, but it makes sense to correct the $R^2$ for bias.
6. One attempt at bias-corrected $R^2$ is the adjusted $R^2$.