## Bias of an estimator

Based on [Bias of an estimator - Wikipedia](https://en.wikipedia.org/wiki/Bias_of_an_estimator)

In statistics, the **bias** (or **bias function**) of an **estimator** is the difference between this **estimator's expected value** (say $\hat{x}$) and the **true value of the parameter** (say $x$) being estimated:

$$ \mathrm{Bias}_{x} = \hat{x} - x $$

An estimator or decision rule with **zero bias** is called **unbiased**.

Bias can also be measured with respect to the median, rather than the mean (expected value), in which case one distinguishes median-unbiased from the usual mean-unbiasedness property.

Bias is related to consistency in that consistent estimators are convergent and asymptotically unbiased (hence converge to the correct value as the number of data points grows arbitrarily large), though individual estimators in a consistent sequence may be biased (so long as the bias converges to zero); see bias versus consistency.

All else being equal, an unbiased estimator is preferable to a biased estimator, but in practice biased estimators are frequently used, generally with small bias.

Take a look on the article about [Sample Variance - Wikipedia](https://en.wikipedia.org/wiki/Variance#Sample_variance) and [Bessel's correction](https://en.wikipedia.org/wiki/Bessel%27s_correction)

## Arithmetic Mean

The **mean** of a **finite time series** (single sequence of $n$ measurements of variable $x$ at succesive times) compound is estimated as:

$$ \bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$$

## Standard deviation

The **standard deviation** $\sigma$ can be estimated by the **standard deviation of the sample** $s_{n}$:

$$ s_{n} = \left( \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} \right)^{1/2} $$

This is the square root of the **sample variance** $s_{n}^{2}$, which is the average of the squared deviations about the sample mean.

This is a **consistent estimator** (it converges in probability to the population value as the number of samples goes to infinity), and is the maximum-likelihood estimate when the population is normally distributed. However, this is a **biased estimator**, as the estimates are generally too low, and this is why it is referred as [uncorrected sample standard deviation](https://en.wikipedia.org/wiki/Standard_deviation#Uncorrected_sample_standard_deviation). **The bias decreases as sample size grows**, dropping off as $1/n$, and thus is most significant for small or moderate sample sizes; for $n > 75$ the bias is below $1 %$. Thus for very large sample sizes, the **uncorrected sample standard deviation** is generally acceptable.

The **corrected sample standard deviation** is:

$$ s = \left( \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} \right)^{1/2} $$

The factor $N − 1$ corresponds to the number of degrees of freedom in the vector of deviations from the mean, $(x_{1} - \bar{x}, \; x_{1} - \bar{x}, \; ... \; , \; x_{n} - \bar{x})$.

**Note:**  
While $s^{2}$ is an unbiased estimator for the **population variance** $\sigma^{2}$, $s$ is still a biased estimator for the **population standard deviation** $\sigma$, though markedly less biased than the uncorrected sample standard deviation.

## Correlation

Based on [Correlation and dependence - Wikipedia](https://en.wikipedia.org/wiki/Correlation_and_dependence)

In statistics, **dependence** or **association** is any statistical relationship, whether causal or not, between two random variables or bivariate data.

In the broadest sense **correlation** is any **statistical association**, though it commonly refers to:

> The degree to which a pair of variables are **linearly related**.

Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a limited supply product and its price.

There are several correlation coefficients measuring the degree of correlation. The most common of these is the **Pearson correlation coefficient**, which is sensitive only to a linear relationship between two variables (which may be present even when one variable is a nonlinear function of the other). **Mutual information** can also be applied to measure dependence between two variables.

**Correlations** are useful because they can indicate a predictive relationship that can be exploited in practice, but it is useful only when causality can be established. In general:

> The presence of a correlation is not sufficient to infer the presence of a causal relationship (i.e., **correlation does not imply causation**).

In informal parlance, *correlation is synonymous with dependence*. However, when used in a technical sense, **correlation refers to any of several specific types of relationship between mean values**.

### Correlation and causality

See:
- [Correlation does not imply causation](https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation)
- [Normally distributed and uncorrelated does not imply independent](https://en.wikipedia.org/wiki/Normally_distributed_and_uncorrelated_does_not_imply_independent)

The conventional dictum that **correlation does not imply causation** means that correlation cannot be used by itself to infer a causal relationship between the variables. This dictum should not be taken to mean that correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown, and high correlations also overlap with identity relations (tautologies), where no causal process exists. Consequently, a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction).

A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health, or does good health lead to good mood, or both? Or does some other factor underlie both? In other words, **a correlation can be taken as evidence for a possible causal relationship, but cannot indicate what the causal relationship, if any, might be**.

## Pearson correlation coefficient

The **population Pearson correlation coefficient** $\rho_{xy}$ can be estimated by **sample correlation coefficient** $r_{xy}$:

$$ r_{xy} = \frac{ \sum_{i = 1}^{n} (x_{i} - \bar{x})(y_{i} - \bar{y})}{(n-1) \; s_{x} \; s_{y}} =  \frac{ \sum_{i = 1}^{n} (x_{i} - \bar{x})(y_{i} - \bar{y})}{\left( \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} \; \sum_{i = 1}^{n} (y_{i} - \bar{y})^{2} \right)^{1/2} }$$

Where $s_{x}$ and $s_{y}$ are **corrected sample standard deviations** for $x$ and $y$ respectively. If **uncorrected sample standard deviations** $s_{x}^{'}$ and $s_{y}^{'}$ are used instead:

$$ r_{xy} = \frac{ \sum_{i = 1}^{n} x_{i} y_{i} - n \; \bar{x} \; \bar{y} }{n \; s_{x}^{'} \; s_{y}^{'}}$$

The **population Pearson correlation coefficient** $\rho_{xy}$ is not bigger than 1, and can be interpreted as:

- $\rho_{xy} = +1$: there exists a perfect direct (increasing) linear relationship (correlation).
- $\rho_{xy} = -1$: there exists a perfect decreasing (inverse) linear relationship (anticorrelation).
- $\rho_{xy} \in (-1, \; 1)$: it indicates the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables.
- If the variables are independent, then $\rho_{xy} = 0$, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables:

$$ x \; y \; \mathrm{are \; independent} \Rightarrow \rho_{xy} = 0 \; (x, \; y \; \mathrm{are \; uncorrelated})$$
$$ \rho_{xy} = 0 \; (x, \; y \; \mathrm{are \; uncorrelated}) \;\not\!\!\!\implies x \; y \; \mathrm{are \; independent} $$

For example, suppose the random variable $x$ is symmetrically distributed about zero, and $y = x^{2}$. Then $y$ is completely determined by $x$, so that $x$ and $y$ are perfectly dependent, but $\rho_{xy} = 0$; they are uncorrelated.

![Autocorrelation values and interpretations 1](img/corr_ex_1.png)

For the case of a **linear model with a single independent variable**, the **coefficient of determination (R squared)** is the square of $r_{xy}$.

However, in the special case when $x$ and $y$ are **jointly normal**, uncorrelatedness is equivalent to independence:

![Jointly normal](img/corr_ex_2.png)

### Comments on correlation and visual inspection of data

![Autocorrelation values and interpretations 3](img/corr_ex_3.png)

The above image shows scatter plots of [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet), a set of four different pairs of variables created by Francis Anscombe. The four $y$ variables have the same mean, variance, correlation and regression line: 

$$ \bar{y} = 7.5 $$
$$ \sigma^{2} = 4.12 $$
$$ r_{xy} = 0.816 $$
$$ y = 3 + 0.5 \; x $$

However, as can be seen on the plots, the distribution of the variables is very different:

- The first one (**top left**) seems to be distributed normally, and corresponds to what one would expect when considering two variables correlated and following the assumption of normality.
- The second one (**top right**) is not distributed normally; while an obvious relationship between the two variables can be observed, it is not linear. In this case the Pearson correlation coefficient does not indicate that there is an exact functional relationship: only the extent to which that relationship can be approximated by a linear relationship.
- In the third case (**bottom left**), the linear relationship is perfect, except for one outlier which exerts enough influence to lower the correlation coefficient from 1 to 0.816.
- Finally, the fourth example (**bottom right**) shows another example when one outlier is enough to produce a high correlation coefficient, even though the relationship between the two variables is not linear.

These examples indicate that **the correlation coefficient**, as a summary statistic, **cannot replace visual examination of the data**.

## Autocorrelation

Based on [Autocorrelation - Wikipedia](https://en.wikipedia.org/wiki/Autocorrelation)

It is also known as **serial correlation** and is the **correlation** of a signal with a delayed copy of itself as a function of delay.

Informally, it is the **similarity between observations** as a function of the time lag between them.

It is used as a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies.

Different fields of study define autocorrelation differently, and not all of these definitions are equivalent. In some fields, the term is used interchangeably with **autocovariance**.

### Estimation

For a discrete process with known mean $\mu$ and variance $\sigma^{2}$ for which we observe $n$ observations $\{x_{1},\; x_{2},\; \ldots ,\; x_{n} \}$, an estimate of the autocorrelation may be obtained as

$$ G(k) = \frac{\sum_{i = 1}^{n-k} (x_{i} - \mu)(x_{i + k} - \mu)}{(n - k) \sigma^{2}} $$

for any positive integer $k < n$. $G(k)$ is called the **autocorrelation coefficient at lag $k$**. When the true mean $\mu$ and variance $\sigma^{2}$ are known, this estimate is **unbiased**.

If the true mean and variance of the process are not known, we can be replaced by the standard formulae for **sample mean** and **sample variance**, yielding a **biased estimate**:

$$ G(k) = \frac{\sum_{i = 1}^{n-k} (x_{i} - \bar{x})(x_{i + k} - \bar{x})}{(n - k) \; s^{2}} = \frac{\sum_{i = 1}^{n-k} (x_{i} - \bar{x})(x_{i + k} - \bar{x})}{\sum_{i = 1}^{n - k} (x_{i} - \bar{x})^{2}} $$

where:

$$ s^{2} = \frac{1}{n - k} \sum_{i = 1}^{n - k} (x_{i} - \bar{x})^{2} $$

Last definition using factor $1/(n-k)$ has less bias than those using $1/n$, nevertheless the latter is the form most used commonly used in the statistics literature because it has some desirable statistical properties (See pages 20 and 49-50 of **Chatfield, C. (1989). The Analysis of Time Series: An Introduction (Fourth ed.). New York, NY: Chapman & Hall**).

Sprott uses the factor $1/(n-k)$.

### Interpretation

The time ordering of the measurements is irrelevant when calculating **mean** and **variance** and thus **they cannot give any information about the time evolution of a system**. 

**Autocorrelation** gives this type of information.

The estimation of the autocorrelations from a time series is straightforward as long
as the lag $k$ is small compared to the total length of the time series. Therefore,
estimates of autocorrelation, are only reasonable for $k << n$.


If we plot values $x_{i}$ versus $x_{i+k}$, the autocorrelation $G(k)$, quantifies how these points are distributed.

Cases:

- If they spread out evenly over the plane, then $G(k) = 0$, and we can say that the data is uncorrelated in lag $k$.
- If they tend to crowd along the diagonal $x_{i} = x_{i + k}$, then $G(k) > 0$.
- If they are closer to the line $x_{i} = - x_{i + k}$, then $G(k) < 0$.

The latter two cases reflect some tendency of $x_{n}$ and $x_{n + k}$ to be proportional to each other, which makes it plausible that the autocorrelation function reflects only linear correlations. In other words, we can say that the autocorrelation function $G(k)$ measures how strongly on average each data point is correlated with one $k$ time steps away.

### Properties

- $G(k)$ is normalized such that $G(0) = 1$
- $G(k)$ is symmetric about $k = 0$, so $G(k) = G(-k)$
- $G(k)$ falls from a value of $G(k = 0) = 1$ to a value of $G(k) = 0$ for large $k$.
- The value of $k$ at which $G(k) = 1/e \approx 37 \%$ is called the **correlation time** $\tau_{e}$. Because of the symmetry property, the full width of $G(k)$ is $2 \tau_{e}$, which is a measure of how much "memory" the system has. The reciprocal of this quantity, $1/(2 \tau_{e})$ **is an estimate of the average rate at which predictability is lost**. Thus it is sometimes called the "poor man's Lyapunov exponent" since its value is often similar to the largest Lyapunov exponent.
- If a signal is periodic in time, then $G(k)$ is periodic in the lag $k$, but it will be a decaying oscillation, in which case $\tau_{e}$ is the time for the envelope to decay to $1/e$.

![Correlogram 1](img/correlogram_ex_1.png)

> **Above:** A plot of a series of 100 random numbers concealing a sine function. **Below:** The sine function revealed in a correlogram produced by autocorrelation.

- In time series analysis, a **correlogram**, also known as an **autocorrelation plot**, is a plot of the sample autocorrelations $G(k)$ versus $k$ (the time lags). The correlogram is a commonly used tool for checking randomness in a data set. This randomness is ascertained by computing autocorrelations for data values at varying time lags. If random, such autocorrelations should be near zero for any and all time-lag separations. If non-random, then one or more of the autocorrelations will be significantly non-zero.

- According to **Wiener-Khinchin theorem**, $G(k)$ is the **Fourier transform of the power spectrum in time domain**. Then the **power spectrum** can be calculated from $G(k)$ as:

> $$ S_{m} = G(0) + G(K) \cos{\left( \frac{2 \pi m K}{n} \right)} + 2 \sum_{k = 1}^{K-1} G(k) \cos{\left( \frac{2 \pi m k}{n} \right)}$$  
where $K$ is the maximum $k$, which you should usually take to be about $n/4$ (Davis 1986).

- $G(k)$ is the ratio of the **autocovariance** to the **variance** of the data.

> The autocorrelation coefficient at lag $k$, G(k) is given by:

$$ G(k) = \frac{c(k)}{c(0)} $$

where $c(k)$ is the **autocovariance function**:

$$ c(k) = \frac{1}{(n - k)} \sum_{i = 1}^{n-k} (x_{i} - \bar{x})(x_{i + k} - \bar{x}) $$

and $c(0)$ is the **variance function**:

$$ c(0) = \frac{1}{n - k} \sum_{i = 1}^{n - k} (x_{i} - \bar{x})^{2}$$


### Autocorrelations, noise and chaos

> The autocorrelation function, $G(k)$, is a linear measure, each term of which (the lag $k$ autocorrelation coeficient) measures the extent to which $x_{n}$ versus $x_{n + k}$ is a straight line.

- Stochastic processes have decaying autocorrelations but the rate of decay depends on the properties of the process.
- Autocorrelations of signals from deterministic chaotic systems typically also decay exponentially with increasing lag. 

> Autocorrelations are not characteristic enough to distinguish random from deterministic chaotic signals.

- Many nonlinear systems, such as the logistic map for $r = 4$, have no linear correlation.
- Uncorrelated data should have $G(k)$ within $\pm 2/\sqrt{n}$ of zero (two standard deviations) for about $95 \%$ of the $k$ values. (Makridakis et al. 1983).

### Serial dependence

**Serial dependence** is closely linked to the notion of autocorrelation, but represents a distinct concept. In particular, it is possible to have serial dependence but no (linear) correlation. In some fields however, the two terms are used as synonyms.

A time series of a random variable has serial dependence if the value at some time $t$ in the series is statistically dependent on the value at another time $s$. A series is serially independent if there is no dependence between any pair.

If a time series $\left\{x_{t}\right\}$ is **stationary**, then statistical dependence between the pair $(x_{t}, \; x_{s})$ would imply that there is statistical dependence between all pairs of values at the same lag $\tau = s-t$.

## Autocovariance

In probability theory and statistics, given a stochastic process, the **autocovariance** is a function that gives the **covariance** of the process with itself at pairs of time points. **Autocovariance** is closely related to the **autocorrelation** of the process in question.

## Cross-correlation

En estadística, el término correlación cruzada a veces es usado para referirse a la covarianza cov(X, Y) entre dos vectores aleatorios X e Y.

En procesamiento de señales, la correlación cruzada (o a veces denominada "covarianza cruzada") es una medida de la similitud entre dos señales, frecuentemente usada para encontrar características relevantes en una señal desconocida por medio de la comparación con otra que sí se conoce. Es función del tiempo relativo entre las señales, a veces también se la llama producto escalar desplazado, y tiene aplicaciones en el reconocimiento de patrones y en criptoanálisis.

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.