### Estimators

#### Estimation examples

**Parameter estimation** - Estimating a single parameter, for example DC level in noisy signal, or time-of-flight in radar signal

**Randon Variable Estimation** - Estimation of one inaccessible RV which is correlated to another. For example, height/weight, temperature/icecream sales.

**Signal estimation** - Forecasting or denoising

#### Estimator model

An estimator is a rule/function that maps a realization \\([x_0, x_1,..,x_{N-1}]\\) into some estimate \\(\theta\\)

$$
\widehat{\mathbf{\theta}}=g(x_0, x_1,..,x_{N-1})
$$

- Use data
- Have an objective function
- Have a model
- Maps a realization of data to the estimate \\(\theta\\)




#### Estimator Evaluation

##### Estimator Bias

The estimator bias is defined as

$$
Bias = \mathbb{E}[\theta-\widehat{\theta}]
$$

For an estimator to be unbiased, on average it has to be equal to the true value

$$
\mathbb{E}[\widehat{\theta}]=\theta
$$

##### Estimator Variance

An estimator is just a mapping of one or several random variables and is a random variable in itself.

##### Examples

Esimtate DC level in white noise. Signal model \\(x_n=A+w_n\\) where \\(w_n\\) is guassian with zero mean and variance \\(\sigma_w^2\\).

The probability distribution will be

$$
f(x;A)=\frac{1}{\sqrt{2\pi\sigma_w^2}} e^{\frac{-(x-A)^2}{2\sigma_w^2}}
$$


If we define our estimator of A to be \\(\widehat{A}=x_0\\) then we can calculate the bias and variance

$$
Bias(\widehat{A})=\mathbb{E}[A-\widehat{A}]=\mathbb{E}[A]-\mathbb{E}[\widehat{A}]=\mathbb{E}[A]-\mathbb{E}[x_0]=A-A=0
$$

Since \\(\\mathbb{E}[A]=A\\) and \\(\mathbb{E}[x_0]=A\\) 

We define the variance as 

$$
Var(\widehat{A})=Var(x_0)=\sigma_w^2
$$

If instead the estimator was \\(\widehat{A}=\frac{1}{N}\sum_{i=0}^{N-1}x_i\\)

$$
Var(\widehat{A})=Var(\frac{1}{N}\sum_{i=0}^{N-1}x_i)=\frac{1}{N^2}N\sigma_w^2=\frac{\sigma_w^2}{N}
$$

Because of the additivie property of the variance \\(Var(A+B)=Var(A)+Var(B)\\)

##### Minumum Variance Unbiased Estimator

The minimum variance estimator, is the unbiased estimator with the least variance.

The minimum variance is determined by the "sharpness" of the PDF. The more the PDF depends on the unknown paramter \\(\theta\\), the more accurate the estimator will be. 

The expected sharpness is defined as below. We take the negative, since the double differentiation tend to be negative at the peak(The first difference goes from zero to negative).

$$
I(\theta) = -E_{\mathbf{x}} \left[ \frac{\partial^2 \ln p[\mathbf{x}; \theta]}{\partial \theta^2} \right]
$$

The effeciency of the estimator can be evaluated by how close we come to the inverse of the information matrix. Hence we want \\(Var(\widehat{\theta})=I(\widehat{\theta})^{-1}\\).
The higher information, the lower variance we expect.

#### Estimator classes

We have the following estimator classes:
- Maximum Likelihood(ML)
- Maximum A Posteriori(MAP)
- Minimum Mean Square Estimator
- Least Squares Estimator

##### Terms

Prior distribution - The believed distribution of parameter \\(\theta\\) before any data is observed.

Posteriori distribution . The updated knowledge about \\(\theta\\) after data has been observed.

##### Maximum Likelihood

The maximum likelihood estimator tries to maximize the likelihood of observing the data, given som parameter \\(\theta\\).

**Model** - Data is a realization of a random process with unknown paramter \\(\theta\\)

***Objective**

$$
\widehat{\theta}=\argmax_{\theta}f(\mathbf{y}|\theta) 
$$

Needs **oberserved data** and **conditional distribution of data \\(f(y|\theta)\\)**

Is always asymptotically unbiased and efficient.

##### Maximum A Posteriori

Maximum a posteriori tries to maximize the posteriori using Bayes theorem \\(f(\theta \mid y) \propto f(y \mid \theta)f(\theta)\\).

**Model** - Combines the likelihood(As in ML) with the prior(Distribution of \\(\theta\\)) and finds the peak of the posteriori distribution.

**Objective**

$$
\widehat{\theta}=\argmax_{\theta} f(y \mid \theta)f(\theta)
$$

Needs **oberserved data**, **conditional distribution of data \\(f(y|\theta)\\)** and **distribution of parameter \\(f(\theta)\\)**

##### Minimum Mean Square

Here we minimize the expected mean square of a parameter, given some observed data

**Model** - Combines likelihood and prior to find the "average" value. For a Gaussian(Or any other symmetric and unimodal distribution) MMSE = MAP.

**Objective**

$$
\widehat{\theta} = \argmin_{\widehat{\theta}} E[(\theta - \widehat{\theta})^2 \mid y]
$$

Needs **oberserved data**, **joint distribution of data and parameter** \\(f(\theta,y)\\)

##### Least Squares

**Model** - We use the linear model \\(\mathbf{y} = \mathbf{H}\theta + \epsilon\\) where it is assumed that the paramter is linearly related to the observed data but no underlying probability distribution of the noise is asumed.

**Objective**

$$
\widehat{\theta}=\argmin_{\theta} \(\mathbf{y} - \mathbf{H}\theta\)^T(\mathbf{y} - \mathbf{H}\theta\)
$$

Solution \\(\widehat{\theta}=(\mathbf{H}^T\mathbf{H})^{-1}\mathbf{H}^T\mathbf{y}\\)

Needs **Observed data**


#### Fitting a straight line to data

We have some data which we assume fits the model \\(y_n=ax_n+b+w_n\\) where \\(w_n\\) is independent gaussian noise with zero mean and variance \\(\sigma_w^2\\)

##### Least squares

We have the coeffecient matrix \\(\theta=[a, b]\\) we are trying to estimate.

The design matrix is \\(\mathbf{H}=[\mathbf{h}_1, \mathbf{h}_2,..,\mathbf{h}_N]^T\\) where \\(\mathbf{h}_1=[1, x_1]\\) and \\(\mathbf{y}=[y_1, y_2,..,y_N]^T\\)

##### Maximum likelihood

The likelihood function of the darta can be expressed as

$$
f(\mathbf{x}|a, b, \sigma_w^2)=\prod_{i=1}^N\frac{1}{\sqrt{2\pi \sigma_w^2}}e^{-()}
$$